Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I've read a lot of articles talking about undefined behavior (UB), but all do talk about theory. I am wondering what could happen in practice, because the programs containing UB may actually run.
My questions relates to unix-like systems, not embedded systems.
I know that one should not write code that relies on undefined behavior. Please do not send answers like this:
Everything could happen
Daemons can fly out of your nose
Computer could jump and catch fire
Especially for the first one, it is not true. You obviously cannot get root by doing a signed integer overflow. I'm asking this for educational purpose only.
Question A)
Source
implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made
Is the implementation the compiler?
Question B)
*"abc" = '\0';
For something else than a segfault to happen, do I need my system to be broken? What could actually happen even if it is not predictable? Could the first byte be set to zero ? What else, and how?
Question C)
int i = 0;
foo(i++, i++, i++);
This is UB because the order in which parameters are evaluated is undefined. Right. But, when the program runs, who decides in what order the parameters are evaluated: is is the compiler, the OS, or something else?
Question D)
Source
$ cat test.c
int main (void)
{
printf ("%d\n", (INT_MAX+1) < 0);
return 0;
}
$ cc test.c -o test
$ ./test
Formatting root partition, chomp chomp
According to other SO users, this is possible. How could this happen? Do I need a broken compiler?
Question E)
Use the same code as above. What could actually happen, except of the expression (INT_MAX+1) yielding a random value ?
Question F)
Does the GCC -fwrapv option defines the behavior of a signed integer overflow, or does it only make GCC assume that it will wrap around but it could in fact not wrap around at runtime?
Question G)
This one concerns embedded systems. Of course, if the PC jumps to an unexpected place, two outputs could be wired together and create a short-circuit (for example).
But, when executing code similar to this:
*"abc" = '\0';
Wouldn't the PC be vectored to the general exception handler? Or what am I missing?
In practice, most compilers use undefined behavior in either of the following ways:
Print a warning at compile time, to inform the user that he probably made a mistake
Infer properties on the values of variables and use those to simplify code
Perform unsafe optimizations as long as they only break the expected semantic of undefined behavior
Compilers are usually not designed to be malicious. The main reason to exploit undefined behavior is usually to get some performance benefit from it. But sometimes that can involve total dead code elimination.
A) Yes. The compiler should document what behavior he chose. But usually that is hard to predict or explain the consequences of UB.
B) If the string is actually instantiated in memory and is in a writable page (by default it will be in a read-only page), then its first character might become a null character. Most probably, the entire expression will be thrown out as dead-code because it is a temporary value that disappears out of the expression.
C) Usually, the order of evaluation is decided by the compiler. Here it might decide to transform it into a i += 3 (or a i = undef if it is being silly). The CPU could reorder instructions at run-time but preserve the order chosen by the compiler if it breaks the semantic of its instruction set (the compiler usually cannot forward the C semantic further down). An incrementation of a register cannot commute or be executed in parallel to an other incrementation of that same register.
D) You need a silly compiler that print "Formatting root partition, chomp chomp" when it detects undefined behavior. Most probably, it will print a warning at compile time, replace the expression by a constant of his choice and produce a binary that simply perform the print with that constant.
E) It is a syntactically correct program, so the compiler will certainly produce a "working" binary. That binary could in theory have the same behavior as any binary you could download on the internet and that you run. Most probably, you get a binary that exit straight away, or that print the aforementioned message and exit straight away.
F) It tells GCC to assume the signed integers wrap around in the C semantic using 2's complement semantic. It must therefore produce a binary that wrap around at run-time. That is rather easy because most architecture have that semantic anyway. The reason for C to have that an UB is so that compilers can assume a + 1 > a which is critical to prove that loops terminate and/or predict branches. That's why using signed integer as loop induction variable can lead to faster code, even though it is mapped to the exact same instructions in hardware.
G) Undefined behavior is undefined behavior. The produced binary could indeed run any instructions, including a jump to an unspecified place... or cleanly trigger an interruption. Most probably, your compiler will get rid of that unnecessary operation.
You obviously cannot get root by doing a signed integer overflow.
Why not?
If you assume that signed integer overflow can only yield some particular value, then you're unlikely to get root that way. But the thing about undefined behavior is that an optimizing compiler can assume that it doesn't happen, and generate code based on that assumption.
Operating systems have bugs. Exploiting those bugs can, among other things, invoke privilege escalation.
Suppose you use signed integer arithmetic to compute an index into an array. If the computation overflows, you could accidentally clobber some arbitrary chunk of memory outside the intended array. That could cause your program to do arbitrarily bad things.
If a bug can be exploited deliberately (and the existence of malware clearly indicates that that's possible), it's at least possible that it could be exploited accidentally.
Also, consider this simple contrived program:
#include <stdio.h>
#include <limits.h>
int main(void) {
int x = INT_MAX;
if (x < x + 1) {
puts("Code that gets root");
}
else {
puts("Code that doesn't get root");
}
}
On my system, it prints
Code that doesn't get root
when compiled with gcc -O0 or gcc -O1, and
Code that gets root
with gcc -O2 or gcc -O3.
I don't have concrete examples of signed integer overflow triggering a security flaw (and I wouldn't post such an example if I had one), but it's clearly possible.
Undefined behavior can in principle make your program do accidentally anything that a program starting with the same privileges could do deliberately. Unless you're using a bug-free operating system, that could include privilege escalation, erasing your hard drive, or sending a nasty e-mail message to your boss.
To my mind, the worst thing that can happen in the face of undefined behavior is something different tomorrow.
I enjoy programming, but I also enjoy finishing a program, and going on to work on something else. I do not delight in continuously tinkering with my already-written programs, to keep them working in the face of bugs they spontaneously develop as hardware, compilers, or other circumstances keep changing.
So when I write a program, it is not enough for it to work. It has to work for the right reasons. I have to know that it works, and that it will keep working next week and next month and next year. It can't just seem to work, to have given apparently correct answers on the -- necessarily finite -- set of test cases I've run it on so far.
And that's why undefined behavior is so pernicious: it might do something perfectly fine today, and then do something completely different tomorrow, when I'm not around to defend it. The behavior might change because someone ran it on a slightly different machine, or with more or less memory, or on a very different set of inputs, or after recompiling it with a different compiler.
See also the third part of this other answer (the part starting with "And now, one more thing, if you're still with me").
It used to be that you could count on the compiler to do something "reasonable". More and more often, though, compilers are truly taking advantage of their license to do weird things when you write undefined code. In the name of efficiency, these compilers are introducing very strange optimizations, which don't do anything close to what you probably want.
Read these posts:
Linus Torvalds describes a kernel bug that was much worse than it could have been given that gcc took advantage of undefined behavior
LLVM blog post on undefined behavior (first of three parts, also two, three)
another great blog post by John Regehr (also first of three parts: two, three)
int p=4, s=5;
int m;
m = (p=s) * (p==s);
printf("%d",m);
How is this expression evaluated?
What will be the value of m?
How do parenthesis change the precedence of operators in this example?
I am getting m=4 in Turbo C.
Is this a code fragment that actually came up in your work, or is it an assignment someone gave you?
The expression contains two parts:
p=s /* in this part p's value is assigned */
p==s /* in this part p's value is used */
So before we can figure out what the value of the expression is, we have to figure out: does p's value get set before or after it gets used?
And the answer is -- I'm going to shout a little here -- WE DO NOT KNOW.
Let me say that again. We simply do not know whether p's value gets set before or after it gets used. So we have no way of predicting what value this expression will evaluate to.
Now, you might think that precedence will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case precedence does not tell you whether p gets set before or after it gets used.
You might think that associativity will tell you whether p gets set before or after it gets used. But it turns out that, no, in this case associativity does not tell you whether p gets set before or after it gets used, either.
Finally, you might think that I'm wrong, that precedence and/or associativity have to be able to tell you what you need to know, that there has to be a way of figuring out what this expression does. But it turns out that I'm right: there is no way of figuring out what this expression does.
You could compile it and try it, but that will only tell you how the compiler you're using today chooses to evaluate it. I guarantee you that there's another compiler out there that will evaluate it differently.
You might ask, which compiler is right? And the answer is, they're both right (or at least, neither of them is wrong). As far as the C language is concerned, this expression is undefined. So there's no right answer, and your compiler isn't wrong no matter what it does.
If this is code that came up in your work, please delete it right away, and figure out what you were really trying to do, and figure out some cleaner, well-defined way of expressing it. And if this code is an assignment, your answer is simply: it's undefined. (If your instructor believes this expression has a well-defined result, you're in the unfortunate position of having an instructor who doesn't know what he's talking about.)
You're reading and writing the variable p multiple times in the same expression without a sequence point, which means the compiler is free to evaluate the two sub-expressions in any order.
This results in undefined behavior, meaning the program's behavior is unpredictable.
The line
a = a++;
is undefined behaviour in C. The question I am asking is: why?
I mean, I get that it might be hard to provide a consistent order in which things should be done. But, certain compilers will always do it in one order or the other (at a given optimization level). So why exactly is this left up to the compiler to decide?
To be clear, I want to know if this was a design decision and if so, what prompted it? Or maybe there is a hardware limitation of some kind?
UPDATE: This question was the subject of my blog on June 18th, 2012. Thanks for the great question!
Why? I want to know if this was a design decision and if so, what prompted it?
You are essentially asking for the minutes of the meeting of the ANSI C design committee, and I don't have those handy. If your question can only be answered definitively by someone who was in the room that day, then you're going to have to find someone who was in that room.
However, I can answer a broader question:
What are some of the factors that lead a language design committee to leave the behaviour of a legal program (*) "undefined" or "implementation defined" (**)?
The first major factor is: are there two existing implementations of the language in the marketplace that disagree on the behaviour of a particular program? If FooCorp's compiler compiles M(A(), B()) as "call A, call B, call M", and BarCorp's compiler compiles it as "call B, call A, call M", and neither is the "obviously correct" behaviour then there is strong incentive to the language design committee to say "you're both right", and make it implementation defined behaviour. Particularly this is the case if FooCorp and BarCorp both have representatives on the committee.
The next major factor is: does the feature naturally present many different possibilities for implementation? For example, in C# the compiler's analysis of a "query comprehension" expression is specified as "do a syntactic transformation into an equivalent program that does not have query comprehensions, and then analyze that program normally". There is very little freedom for an implementation to do otherwise.
By contrast, the C# specification says that the foreach loop should be treated as the equivalent while loop inside a try block, but allows the implementation some flexibility. A C# compiler is permitted to say, for example "I know how to implement foreach loop semantics more efficiently over an array" and use the array's indexing feature rather than converting the array to a sequence as the specification suggests it should.
A third factor is: is the feature so complex that a detailed breakdown of its exact behaviour would be difficult or expensive to specify? The C# specification says very little indeed about how anonymous methods, lambda expressions, expression trees, dynamic calls, iterator blocks and async blocks are to be implemented; it merely describes the desired semantics and some restrictions on behaviour, and leaves the rest up to the implementation.
A fourth factor is: does the feature impose a high burden on the compiler to analyze? For example, in C# if you have:
Func<int, int> f1 = (int x)=>x + 1;
Func<int, int> f2 = (int x)=>x + 1;
bool b = object.ReferenceEquals(f1, f2);
Suppose we require b to be true. How are you going to determine when two functions are "the same"? Doing an "intensionality" analysis -- do the function bodies have the same content? -- is hard, and doing an "extensionality" analysis -- do the functions have the same results when given the same inputs? -- is even harder. A language specification committee should seek to minimize the number of open research problems that an implementation team has to solve!
In C# this is therefore left to be implementation-defined; a compiler can choose to make them reference equal or not at its discretion.
A fifth factor is: does the feature impose a high burden on the runtime environment?
For example, in C# dereferencing past the end of an array is well-defined; it produces an array-index-was-out-of-bounds exception. This feature can be implemented with a small -- not zero, but small -- cost at runtime. Calling an instance or virtual method with a null receiver is defined as producing a null-was-dereferenced exception; again, this can be implemented with a small, but non-zero cost. The benefit of eliminating the undefined behaviour pays for the small runtime cost.
A sixth factor is: does making the behaviour defined preclude some major optimization? For example, C# defines the ordering of side effects when observed from the thread that causes the side effects. But the behaviour of a program that observes side effects of one thread from another thread is implementation-defined except for a few "special" side effects. (Like a volatile write, or entering a lock.) If the C# language required that all threads observe the same side effects in the same order then we would have to restrict modern processors from doing their jobs efficiently; modern processors depend on out-of-order execution and sophisticated caching strategies to obtain their high level of performance.
Those are just a few factors that come to mind; there are of course many, many other factors that language design committees debate before making a feature "implementation defined" or "undefined".
Now let's return to your specific example.
The C# language does make that behaviour strictly defined(†); the side effect of the increment is observed to happen before the side effect of the assignment. So there cannot be any "well, it's just impossible" argument there, because it is possible to choose a behaviour and stick to it. Nor does this preclude major opportunities for optimizations. And there are not a multiplicity of possible complex implementation strategies.
My guess, therefore, and I emphasize that this is a guess, is that the C language committee made ordering of side effects into implementation defined behaviour because there were multiple compilers in the marketplace that did it differently, none was clearly "more correct", and the committee was unwilling to tell half of them that they were wrong.
(*) Or, sometimes, its compiler! But let's ignore that factor.
(**) "Undefined" behaviour means that the code can do anything, including erasing your hard disk. The compiler is not required to generate code that has any particular behaviour, and not required to tell you that it is generating code with undefined behaviour. "Implementation defined" behaviour means that the compiler author is given considerable freedom in choice of implementation strategy, but is required to pick a strategy, use it consistently, and document that choice.
(†) When observed from a single thread, of course.
It's undefined because there is no good reason for writing code like that, and by not requiring any specific behaviour for bogus code, compilers can more aggressively optimize well-written code. For example, *p = i++ may be optimized in a way that causes a crash if p happens to point to i, possibly because two cores write to the same memory location at the same time. The fact that this also happens to be undefined in the specific case that *p is explicitly written out as i, to get i = i++, logically follows.
It's ambiguous but not syntactically wrong. What should a be? Both = and ++ have the same "timing." So instead of defining an arbitrary order it was left undefined since either order would be in conflict with one of the two operators definitions.
With a few exceptions, the order in which expressions are evaluated is unspecified; this was a deliberate design decision, and it allows implementations to rearrange the evaluation order from what's written if that will result in more efficient machine code. Similarly, the order in which the side effects of ++ and -- are applied is unspecified beyond the requirement that it happen before the next sequence point, again to give implementations the freedom to arrange operations in an optimal manner.
Unfortunately, this means that the result of an expression like a = a++ will vary based on the compiler, compiler settings, surrounding code, etc. The behavior is specifically called out as undefined in the language standard so that compiler implementors don't have to worry about detecting such cases and issuing a diagnostic against them. Cases like a = a++ are obvious, but what about something like
void foo(int *a, int *b)
{
*a = (*b)++;
}
If that's the only function in the file (or if its caller is in a different file), there's no way to know at compile time whether a and b point to the same object; what do you do?
Note that it's entirely possible to mandate that all expressions be evaluated in a specific order, and that all side effects be applied at a specific point in evaluation; that's what Java and C# do, and in those languages expressions like a = a++ are always well-defined.
The postfix ++ operator returns the value prior to the incrementation. So, at the first step, a gets assigned to its old value (that's what ++ returns). At the next point it is undefined whether the increment or the assignment will take place first, because both operations are applied over the same object (a), and the language says nothing about the order of evaluation of these operators.
Somebody may provide another reason, but from an optimization (better say assembler presentation) point of view, a needs be loaded into a CPU register, and the postfix operator's value should be placed into another register or the same.
So the last assignment can depend on either the optimizer using one register or two.
Updating the same object twice without an intervening sequence point is undefined behaviour ...
because that makes compiler writers happier
because it allows implementations to define it anyway
because it doesn't force a specific constraint when it isn't needed
Suppose a is a pointer with value 0x0001FFFF. And suppose the architecture is segmented so that the compiler needs to apply the increment to the high and low parts separately, with a carry between them. The optimiser could conceivably reorder the writes so that the final value stored is 0x0002FFFF; that is, the low part before the increment and the high part after the increment.
This value is twice either value that you might have expected. It may point to memory not owned by the application, or it may (in general) be a trapping representation. In other words, the CPU may raise a hardware fault as soon as this value is loaded into a register, crashing the application. Even if it doesn't cause an immediate crash, it is a profoundly wrong value for the application to be using.
The same kind of thing can happen with other basic types, and the C language allows even ints to have trapping representations. C tries to allow efficient implementation on a wide range of hardware. Getting efficient code on a segmented machine such as the 8086 is hard. By making this undefined behaviour, a language implementer has a bit more freedom to optimise aggressively. I don't know if it has ever made a performance difference in practice, but evidently the language committee wanted to give every benefit to the optimiser.
Our class was asked this question by the C programming prof:
You are given the code:
int x=1;
printf("%d",++x,x+1);
What output will it always produce ?
Most students said undefined behavior. Can anyone help me understand why it is so?
Thanks for the edit and the answers but I'm still confused.
The output is likely to be 2 in every reasonable case. In reality, what you have is undefined behavior though.
Specifically, the standard says:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
There is a sequence point before evaluating the arguments to a function, and a sequence point after all the arguments have been evaluated (but the function not yet called). Between those two (i.e., while the arguments are being evaluated) there is not a sequence point (unless an argument is an expression includes one internally, such as using the && || or , operator).
That means the call to printf is reading the prior value both to determine the value being stored (i.e., the ++x) and to determine the value of the second argument (i.e., the x+1). This clearly violates the requirement quoted above, resulting in undefined behavior.
The fact that you've provided an extra argument for which no conversion specifier is given does not result in undefined behavior. If you supply fewer arguments that conversion specifiers, or if the (promoted) type of the argument disagrees with that of the conversion specifier you get undefined behavior -- but passing an extra parameter does not.
Any time the behavior of a program is undefined, anything can happen — the classical phrase is that "demons may fly out of your nose" — although most implementations don't go that far.
The arguments of a function are conceptually evaluated in parallel (the technical term is that there is no sequence point between their evaluation). That means the expressions ++x and x+1 may be evaluated in this order, in the opposite order, or in some interleaved way. When you modify a variable and try to access its value in parallel, the behavior is undefined.
With many implementations, the arguments are evaluated in sequence (though not always from left to right). So you're unlikely to see anything but 2 in the real world.
However, a compiler could generate code like this:
Load x into register r1.
Calculate x+1 by adding 1 to r1.
Calculate ++x by adding 1 to r1. That's ok because x has been loaded into r1. Given how the compiler was designed, step 2 cannot have modified r1, because that could only happen if x was read as well as written between two sequence points. Which is forbidden by the C standard.
Store r1 into x.
And on this (hypothetical, but correct) compiler, the program would print 3.
(EDIT: passing an extra argument to printf is correct (§7.19.6.1-2 in N1256; thanks to Prasoon Saurav) for pointing this out. Also: added an example.)
The correct answer is: the code produces undefined behavior.
The reason the behavior is undefined is that the two expressions ++x and x + 1 are modifying x and reading x for an unrelated (to modification) reason and these two actions are not separated by a sequence point. This results in undefined behavior in C (and C++). The requirement is given in 6.5/2 of C language standard.
Note, that the undefined behavior in this case has absolutely nothing to do with the fact that printf function is given only one format specifier and two actual arguments. To give more arguments to printf than there are format specifiers in the format string is perfectly legal in C. Again, the problem is rooted in the violation of expression evaluation requirements of C language.
Also note, that some participants of this discussion fail to grasp the concept of undefined behavior, and insist on mixing it with the concept of unspecified behavior. To better illustrate the difference let's consider the following simple example
int inc_x(int *x) { return ++*x; }
int x_plus_1(int x) { return x + 1; }
int x = 1;
printf("%d", inc_x(&x), x_plus_1(x));
The above code is "equivalent" to the original one, except that the operations that involve our x are wrapped into functions. What is going to happen in this latest example?
There's no undefined behavior in this code. But since the order of evaluation of printf arguments is unspecified, this code produces unspecified behavior, i.e. it is possible that printf will be called as printf("%d", 2, 2) or as printf("%d", 2, 3). In both cases the output will indeed be 2. However, the important difference of this variant is that all accesses to x are wrapped into sequence points present at the beginning and at the end of each function, so this variant does not produce undefined behavior.
This is exactly the reasoning some other posters are trying to force onto the original example. But it cannot be done. The original example produces undefined behavior, which is a completely different beast. They are apparently trying to insist that in practice undefined behavior is always equivalent to unspecified behavior. This is a totally bogus claim that only indicate the lack of expertise in those who make it. The original code produces undefined behavior, period.
To continue with the example, let's modify the previous code sample to
printf("%d %d", inc_x(&x), x_plus_1(x));
the output of the code will become generally unpredictable. It can print 2 2 or it can print 2 3. However note that even though the behavior is unpredictable, it still does not produce the undefined behavior. The behavior is unspecified, bit not undefined. Unspecified behavior is restricted to two possibilities: either 2 2 or 2 3. Undefined behavior is not restricted to anything. It can format you hard drive instead of printing something. Feel the difference.
Most students said undefined behavior. Can anyone help me understand why it is so?
Because order in which function parameters are calculated is not specified.
What output will it always produce ?
It will produce 2 in all environments I can think of. Strict interpretation of the C99 standard however renders the behaviour undefined because the accesses to x do not meet the requirements that exist between sequence points.
Most students said undefined behavior.
Can anyone help me understand why it
is so?
I will now address the second question which I understand as "Why do most of the students of my class say that the shown code constitutes undefined behaviour?" and I think no other poster has answered so far. One part of the students will have remembered examples of undefined value of expressions like
f(++i,i)
The code you give fits this pattern but the students erroneously think that the behaviour is defined anyway because printf ignores the last parameter. This nuance confuses many students. Another part of the student will be as well versed in standard as David Thornley and say "undefined behaviour" for the correct reasons explained above.
The points made about undefined behavior are correct, but there is one additional wrinkle: printf may fail. It's doing file IO; there are any number of reasons it could fail, and it's impossible to eliminate them without knowing the complete program and the context in which it will be executed.
Echoing codaddict the answer is 2.
printf will be called with argument 2 and it will print it.
If this code is put in a context like:
void do_something()
{
int x=1;
printf("%d",++x,x+1);
}
Then the behaviour of that function is completely and unambiguously defined. I'm not of course arguing that this is good or correct or that the value of x is determinable afterwards.
The output will be always (for 99.98% of the most important stadard compliant compilers and systems) 2.
According to the standard, this seems to be, by definition, "undefined behaviour", a definition/answer that is self-justifying and that says nothing about what actually can happen, and especially why.
The utility splint (which is not a std compliance checking tool), and so splint's programmers, consider this as "unspecified behaviour". This means, basically, that the evaluation of (x+1) can give 1+1 or 2+1, depending on when the update of x is actually done. Since however the expression is discarded (printf format reads 1 argument), the output is unaffected, and we can still say it is 2.
undefined.c:7:20: Argument 2 modifies x, used by argument 3 (order of
evaluation of actual parameters is undefined): printf("%d\n", ++x, x + 1)
Code has unspecified behavior. Order of evaluation of function parameters or
subexpressions is not defined, so if a value is used and modified in
different places not separated by a sequence point constraining evaluation
order, then the result of the expression is unspecified.
As said before, the unspecified behaviour affect just the evaluation of (x+1), not the whole statement or other expressions of it. So in the case of "unspecified behaviour" we can say that the output is 2, and nobody could object.
But this is not unspecified behaviour, it seems to be "undefined behaviour". And the "undefined behaviour" seems to have to be something that affect the whole statement instead of the single expression. This is due to the mistery around where the "undefined behaviour" actually occur (i.e. what exactly affects).
If there would be motivations to attach the "undefined behaviour" just to the (x+1) expression, as in the "unspecified behaviour" case, then we still could say that the output is always (100%) 2. Attaching the "undefined behaviour" just to (x+1) means that we are not able to say if it is 1+1 or 2+1; it is just "anything". But again, that "anything" is dropped because of the printf, and this means that the answer would be "always (100%) 2".
Instead, because of misterious asymmetries, the "undefined behaviour" can't be attached just to the x+1, but indeed it must affect at least the ++x (which by the way is the responsible for the undefined behaviour), if not the whole statement. If it infects just the ++x expression, the output is a "undefined value", i.e. any integer, e.g. -5847834 or 9032. If it infects the whole statement, then you could see gargabe in your console output, likely you could have to stop the program with ctrl-c, possibly before it starts to choke your cpu.
According to an urban legend, the "undefined behaviour" infects not only the whole program, but also your computer and the laws of physics, so that misterious creatures can be created by your program and fly away or eat you.
No answers explain anything competently about the topic. They are just a "oh see the standard says this" (and it is just an interpretation, as usual!). So at least you have learned that "standards exist", and they make arid the educational questions (since of course, don't forget that your code is wrong, regardless undefined/unspecified behaviourism and other standard facts), unuseful the logic arguments and aimless the deep investigations and understanding.
This was an interview question. I said they were the same, but this was adjudged an incorrect response. From the assembler point of view, is there any imaginable difference? I have compiled two short C programs using default gcc optimization and -S to see the assembler output, and they are the same.
The interviewer may have wanted an answer something like this:
i=i+1 will have to load the value of i, add one to it, and then store the result back to i. In contrast, ++i may simply increment the value using a single assembly instruction, so in theory it could be more efficient. However, most compilers will optimize away the difference, and the generated code will be exactly the same.
FWIW, the fact that you know how to look at assembly makes you a better programmer than 90% of the people I've had to interview over the years. Take solace in the fact that you won't have to work with the clueless loser who interviewed you.
Looks like you were right and they were wrong. I had a similar issue in a job interview, where I gave the correct answer that was deemed incorrect.
I confidently argued the point with my interviewer, who obviously took offence to my impudence. I didn't get the job, but then again, working under someone who "knows everything" wouldn't be that desirable either.
You are probably right. A naive compiler might do:
++i to inc [ax]
and
i = i + 1 to add [ax], 1
but any half sensible compiler will just optimize adding 1 to the first version.
This all assumes the relevant architecture has inc and add instructions (like x86 does).
To defend the interviewer, context is everything. What is the type of i? Are we talking C or C++ (or some other C like language)? Were you given:
++i;
i = i + 1;
or was there more context?
If I had been asked this, my first response would've been "is i volatile?" If the answer is yes, then the difference is huge. If not, the difference is small and semantic, but pragmatically none. The proof of that is the difference in parse tree, and the ultimate meaning of the subtrees generated.
So it sounds like you got the pragmatic side right, but the semantic/critical thought side wrong.
To attack the interviewer (without context), I'd have to wonder what the purpose of the question was. If I asked the question, I'd want to use it to find out if the candidate knew subtle semantic differences, how to generate a parse tree, how to think critically and so on and so forth. I typically ask a C question of my interviewees that nearly every candidate gets wrong - and that's by design. I actually don't care about the answer to the question, I care about the journey that I will be taking with the candidate to reach understanding, which tells me far more about than right/wrong on a trivia question.
In C++, it depends if i is an int or an object. If it's an object, it would probably generate a temporary instance.
the context is the main thing here because on a optimized release build the compiler will optimize away the i++ if its available to a simple [inc eax]. whereas something like int some_int = i++ would need to store the i value in some_int FIRST and only then increment i.