This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 9 years ago.
#include<stdio.h>
#define CUBE(x) (x*x*x)
int main()
{
int a, b=3;
a = CUBE(++b);
printf("%d, %d\n", a, b);
return 0;
}
This code returns the value of a=150 and b=6. Please explain this.
I think when it executes the value of a will be calculated as a=4*5*6=120 but it isn't true according to the compiler , so please explain the logic....
There's no logic, it's undefined behavior because
++b * ++b * ++b;
modifies and reads b 3 times with no interleaving sequence points.
Bonus: You'll see another weird behavior if you try CUBE(1+2).
In addition to what Luchian Grigore said (which explains why you observe this weird behavior) you should notice that this macro is horrible: it can cause subtle and very difficult to track-down bugs, especially when called with a statement that has side-effects (like ++b) since this will cause the statement to execute multiple times.
You should learn three thing from this:
Never reference a macro argument more than once in a macro. While there are exceptions to this rule, you should prefer to think of it as absolute.
Try to avoid calling macros with statements that contain side-effects if possible.
Try to avoid function-like macros when possible. Use inline functions instead.
Its undefined behavior to change same variable more then once in a sequence. And because this reason you will get difference results with different compilers for your code.
By the chance I am also getting same result a = 150 and b = 6 with my compiler.
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
Your macro expression a = CUBE(++b); expanses as
a = ++b * ++b * ++b;
And b is change more then once before end of full expression..
But how my compiler convert this expression at low level (may be your compiler do similarly and you can try with same technique). for this I compiled the source C with -S option and I got an assembly code.
gcc x.c -S
you will get x.s file.
I am showing partial useful asm code (read comments)
Because you want to know how does 150 output, thats why I am adding my answer
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $3, -8(%rbp) // b = 3
addl $1, -8(%rbp) // b = 4
addl $1, -8(%rbp) // b = 5
movl -8(%rbp), %eax // eax = b
imull -8(%rbp), %eax // 5*5 = 25
addl $1, -8(%rbp) // 6 `b` become 6 and
imull -8(%rbp), %eax // 6 * 25 = 150
movl %eax, -4(%rbp) // 150 assign to `a` become 150
movl $.LC0, %eax // printf function stuff...
movl -8(%rbp), %edx
movl -4(%rbp), %ecx
movl %ecx, %esi
movq %rax, %rdi
On inspecting this assembly code I can understand it evaluate the expression like
a = 5 * 5 * 6 thus a becomes 150 and after three increments b becomes 6.
Although different compilers produce different result but I think, 150 cab only be evaluated this sequence for b=3 and your expression in 5*5*6
Related
Consider the following code for producing a list of numbers from 0 to 9 along with values of 2 and -3 raised to the power of the corresponding number from the list:
#include <stdio.h>
int power(int m, int n);
main()
{
int i;
for (i = 0; i <= 10; ++i)
printf("%d %d %d\n", i, power(2, i), power(-3, i));
return 0;
}
int power(int base, int n)
{
int i, p;
p = 1;
for (i = 1; i <= n; ++i)
p = p * base;
// return statement purposefully omitted. //
}
Of course the program does not work properly without the return statement for the power function, however by running the written code I get the following output:
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 7 7
7 8 8
8 9 9
9 10 10
And I'm wondering where are the numbers in the second and third column of the output coming from? In lack of a valid return value of power, the control transfers back to the calling function, but why does it output these numbers?
As pointed out by #dyukha and #Daniel H, the value returned is "whatever is in your EAX register". Your function finishes on a for-loop, hence the last instruction realized before returning (ending the function) probably was a branching test, to check if i <= n (your loop condition). You can actually check on what variable is set to your EAX register by using your compiler to generate the assembly version of your code (option -S). You may try to follow what values are set into your register before the call to
popq %rbp
retq
at the end of your function.
On my computer, I tried with Apple LLVM version 9.0.0 (clang-900.0.39.2), which generate the following for my function:
movl %edi, -8(%rbp)
movl %esi, -12(%rbp)
movl $1, -20(%rbp)
movl $1, -16(%rbp)
LBB2_1: ## =>This Inner Loop Header: Depth=1
movl -12(%rbp), %eax
cmpl -16(%rbp), %eax
jl LBB2_4
## BB#2: ## in Loop: Header=BB2_1 Depth=1
movl -20(%rbp), %eax
imull -8(%rbp), %eax
movl %eax, -20(%rbp)
## BB#3: ## in Loop: Header=BB2_1 Depth=1
movl -16(%rbp), %eax
addl $1, %eax
movl %eax, -16(%rbp)
jmp LBB2_1
LBB2_4:
movl -4(%rbp), %eax
popq %rbp
retq
As you can see, I have 4 remarkable addresses: -8(%rbp), -12(%rbp), -16(%rbp) and -20(%rbp). Given the order of declaration in the C code, and the order of initialisation, -8(%rbp) is base, -12(%rbp) is n, -16(%rbp) is i and -20(%rbp) is p.
LBB2_1 is your loop condition. The instructions for the check are
* move value of n in %eax
* is the value of %eax lower then the value of i, store the result in %eax
* if %eax says that it was lower, go to label LBB2_4 else continue to next instruction
The three instructions after BB#2 are you actual multiplication. The three instructions after BB#3 are your increment of i, which is followed by an unconditional jump to your loop condition at label LBB2_1.
The ending of the power function is to take whatever is in memory address -4(%rbp), tu put it in %eax, and then leave the function (reset stack pointer, put value in %eax to the proper variable in the previous stack frame).
In the code produced by my compiler, I don't see the same result as you do, as I get every time the last two columns equal to 0 (-4(%rbp) is never set to anything). Except when adding a call to another function foo, taking two integers as parameters, having two local integer variables (to ensure that my new stack frame will be the same size as the power function one). This function actually set the address -4(%rbp). When calling my function right before entering the loop, I effectively find the value from -4(%rbp) as set in my function foo returned in my function power.
As a colleague just told me, playing with undefined behaviour is dangerous as your compiler is allowed to treat it any way it likes. It could be summoning a demon for what it's worth.
In definitive, or TL;DR, this undefined behaviour is handle in some way by the compiler. Whether some value is moved from one local variable, defined or not, or nothing special is moved to the register %eax is up to the compiler. Anyway, whatever was hanging in there, got returned when retq is called.
In C language,what's the assemble of "b++".
I got two situations:
1) one instruction
addl $0x1,-4(%rbp)
2) three instructions
movl -4(%rbp), %eax
leal 1(%rax), %edx
movl %edx, -4(%rbp)
Are these two situations caused by the compiler?
my code:
int main()
{
int ret = 0;
int i = 2;
ret = i++;
ret = ++i;
return ret;
}
the .s file(++i use addl instrction, i++ use other):
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, -8(%rbp) //ret
movl $2, -4(%rbp) //i
movl -4(%rbp), %eax
leal 1(%rax), %edx
movl %edx, -4(%rbp)
movl %eax, -8(%rbp)
addl $1, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, -8(%rbp)
movl -8(%rbp), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413"
.section .note.GNU-stack,"",#progbits
The ISO standard does not mandate at all what happens under the covers. It specifies a "virtual machine" that acts in a certain way given the C instructions you provide to it.
So, if your C compiler is implemented as a C-to-Dartmouth-Basic converter, b++ is just as likely to lead to 10 let b = b + 1 as anything else :-)
If you're compiling to common assembler code, then you're likely to see a difference depending on whether you use the result, specifically b++; as opposed to a = b++ since the result of the former can be safely thrown away.
You're also likely to see massive differences based on optimisation level.
Bottom line, short of specifying all the things that can affect the output (including but not limited to compiler, target platform, and optimisation levels).
The first one is the output for ++i as part of ret = ++i. It doesn't need to keep the old value around, because it's doing ++i and then res=i. Incrementing in memory and then reloading that is a really stupid and inefficient way to compile that, but you compiled with optimization disabled so gcc isn't even trying to make good asm output.
The 2nd one is the output for i++ as part of ret = i++. It needs to keep the old value of i around, so it loads into a register and uses lea to calculate i+1 in a different register. It could have just stored to ret and then incremented the register before storing back to i, but I guess with optimizations disabled gcc doesn't notice that.
Previous answer to the previous vague question without source, and with bogus code:
The asm for a tiny expression like b++ totally depends on the surrounding code in the rest of the function (or with optimization disabled, at least the rest of the statement) and whether it's a global or local, and whether it's declared volatile.
And of course compiler optimization options have a massive impact; with optimization disabled, gcc makes a separate block of asm for every C statement so you can use the GDB jump command to go to a different source line and have the code still produce the same behaviour you'd expect from the C abstract machine. Obviously this highly constrains code-gen: nothing is kept in registers across statements. This is good for source-level debugging, but sucks to read by hand because of all the noise of store/reload.
For the choice of inc vs. add, see INC instruction vs ADD 1: Does it matter? clang -O3 -mtune=bdver2 uses inc for memory-destination increments, but with generic tuning or any Intel P6 or Sandybridge-family CPU it uses add $1, (mem) for better micro-fusion.
See How to remove "noise" from GCC/clang assembly output?, especially the link to Matt Godbolt's CppCon2017 talk about looking at and making sense of compiler asm output.
The 2nd version in your original question looks like mostly un-optimized compiler output for this weird source:
// inside some function
int b;
// leaq -4(%rbp), %rax // rax = &b
b++; // incl (%rax)
b = (int)&b; // mov %eax, -4(%rbp)
(The question has since been edited to different code; looks like the original was mis-typed by hand mixing an opcode from once line with an operand from another line. I reproduce it here so all the comments about it being weird still make sense. For the updated code, see the first half of my answer: it depends on surrounding code and having optimization disabled. Using res = b++ needs the old value of b, not the incremented value, hence different asm.)
If that's not what your source does, then you must have left out some intervening instructions or something. Or else the compiler is re-using that stack slot for something else.
I'm curious what compiler you got that from, because gcc and clang typically don't like to use results they just computed. I'd have expected incl -4(%rbp).
Also that doesn't explain mov %eax, -4(%rbp). The compiler already used the address in %rax for inc, so why would a compiler revert to a 1-byte-longer RBP-relative addressing mode instead of mov %eax, (%rax)? Referencing fewer different registers that haven't been recently written is a good thing for Intel P6-family CPUs (up to Nehalem), to reduce register-read stalls. (Otherwise irrelevant.)
Using RBP as a frame pointer (and doing increments in memory instead of keeping simple variables in registers) looks like un-optimized code. But it can't be from gcc -O0, because it computes the address before the increment, and those have to be from two separate C statements.
b++ = &b; isn't valid because b++ isn't an lvalue. Well actually the comma operator lets you do b++, b = &b; in one statement, but gcc -O0 still evaluates it in order, rather than computing the address early.
Of course with optimization enabled, b would have to be volatile to explain incrementing in memory right before overwriting it.
clang is similar, but actually does compute that address early. For b++; b = &b;, notice that clang6.0 -O0 does an LEA and keeps RAX around across the increment. I guess clang's code-gen doesn't support consistent debugging with GDB's jump the way gcc does.
leaq -4(%rbp), %rax
movl -4(%rbp), %ecx
addl $1, %ecx
movl %ecx, -4(%rbp)
movl %eax, %ecx # copy the LEA result
movl %ecx, -4(%rbp)
I wasn't able to get gcc or clang to emit the sequence of instructions you show in the question with unoptimized or optimized + volatile, on the Godbolt compiler explorer. I didn't try ICC or MSVC, though. (Although unless that's disassembly, it can't be MSVC because it doesn't have an option to emit AT&T syntax.)
Any good compiler will optimise b++ to ++b if the result of the expression is discarded. You see this particularly in increments in for loops.
That's what is happening in your "one instruction" case.
It's not typically instructive to look at un-optimized compiler output, since values (variables) will usually be updated using a load-modify-store paradigm. This might be useful initially when getting to grips with assembly, but it's not the output to expect from an optimizing compiler that maintains values, pointers, etc., in registers for frequent use. (see: locality of reference)
/* un-optimized logic: */
int i = 2;
ret = i++; /* assign ret <- i, and post-increment i (ret = i; i++ (i = 3)) */
ret = ++i; /* pre-increment i, and assign ret <- i (++i (i = 4); ret = i) */
i.e., any modern, optimising compiler can easily determine that the final value of ret is (4).
Removing all the extraneous directives, etc., gcc-7.3.0 on OS X gives me:
_main: /* Darwin x86-64 ABI adds leading underscores to symbols... */
movl $4, %eax
ret
Apple's native clang, and the MacPorts clang-6.0 set up basic stack frame, but still optimise the ret arithmetic away:
_main:
pushq %rbp
movq %rsp, %rbp
movl $4, %eax
popq %rbp
retq
Note that the Mach-O (OS X) ABI is very similar to the ELF ABI for user-space code. Just try compiling with at least -O2 to get a feel for 'real' (production) code.
It is a common optimization to use conditional move (assembly cmov) to optimize the conditional expression ?: in C. However, the C standard says:
The first operand is evaluated; there is a sequence point between its evaluation and the
evaluation of the second or third operand (whichever is evaluated). The second operand
is evaluated only if the first compares unequal to 0; the third operand is evaluated only if
the first compares equal to 0; the result is the value of the second or third operand
(whichever is evaluated), converted to the type described below.110)
For example, the following C code
#include <stdio.h>
int main() {
int a, b;
scanf("%d %d", &a, &b);
int c= a > b ? a + 1 : 2 + b;
printf("%d", c);
return 0;
}
will generate optimized related asm code as follows:
call __isoc99_scanf
movl (%rsp), %esi
movl 4(%rsp), %ecx
movl $1, %edi
leal 2(%rcx), %eax
leal 1(%rsi), %edx
cmpl %ecx, %esi
movl $.LC1, %esi
cmovle %eax, %edx
xorl %eax, %eax
call __printf_chk
According to the standard, the conditional expression will only have one branch evaluated. But here both branches are evaluated, which is against the standard's semantics. Is this optimization against the C standard? Or do many compiler optimizations have something inconsistent with the language standard?
The optimization is legal, due to the "as-if rule", i.e. C11 5.1.2.3p6.
A conforming implementation is just required to produce a program that when run produces the same observable behaviour as the execution of the program using the abstract semantics would have produced. The rest of the standard just describes these abstract semantics.
What the compiled program does internally does not matter at all, the only thing that matters is that when the program ends it does not have any other observable behaviour, except reading the a and b and printing the value of a + 1 or b + 2 depending on which one a or bis greater, unless something occurs that causes the behaviour be undefined. (Bad input causes a, b be uninitialized and therefore accesses undefined; range error and signed overflow can occur too.) If undefined behaviour occurs, then all bets are off.
Since accesses to volatile variables must be evaluated strictly according to the abstract semantics, you can get rid of the conditional move by using volatile here:
#include <stdio.h>
int main() {
volatile int a, b;
scanf("%d %d", &a, &b);
int c = a > b ? a + 1 : 2 + b;
printf("%d", c);
return 0;
}
compiles to
call __isoc99_scanf#PLT
movl (%rsp), %edx
movl 4(%rsp), %eax
cmpl %eax, %edx
jg .L7
movl 4(%rsp), %edx
addl $2, %edx
.L3:
leaq .LC1(%rip), %rsi
xorl %eax, %eax
movl $1, %edi
call __printf_chk#PLT
[...]
.L7:
.cfi_restore_state
movl (%rsp), %edx
addl $1, %edx
jmp .L3
by my GCC Ubuntu 7.2.0-8ubuntu3.2
The C Standard describes an abstract machine executing C code. A compiler is free to perform any optimization as long as that abstraction is not violated, i.e. a conforming program cannot tell the difference.
Suppose I have the following C code:
#include
int main()
{
int x = 11;
int y = x + 3;
printf("%d\n", x);
return 0;
}
Then I compile it into asm using gcc, I get this(with some flag removed):
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $11, -4(%rbp)
movl -4(%rbp), %eax
addl $3, %eax
movl %eax, -8(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
My problem is why it is movl -4(%rbp), %eax followed by movl %eax, %esi, rather than a simple movl -4(%rbp), %esi(which works well according to my experiment)?
You probably did not enable optimizations.
Without optimization the compiler will produce code like this. For one it does not allocate data to registers, but on the stack. This means that when you operate on variables they will first be transferred to a register and then operated on.
So given that x lives is allocated in -4(%rbp) and this is what the code appears as if you translate it directly without optimization. First you move 11 to the storage of x. This means:
movl $11, -4(%rbp)
done with the first statement. The next statement is to evaluate x+3 and place in the storage of y (which is -8(%rbp), this is done without regard of the previous generated code:
movl -4(%rbp), %eax
addl $3, %eax
movl %eax, -8(%rbp)
done with the second statement. By the way that is divided into two parts: evaluation of x+3 and the storage of the result. Then the compiler continues to generate code for the printf statement, again without taking earlier statements into account.
If you on the other hand would enable optimization the compiler does a number of smart and to humans obvious things. One thing is that it allows variables to be allocated to registers, or at least keep track on where one can find the value of the variable. In this case the compiler would for example know in the second statement that x is not only stored at -4(%ebp) it will also know that it is stored in $11 (yes it nows it's actual value). It can then use this to add 3 to it which means it knows the result to be 14 (but it's smarter that that - it has also seen that you didn't use that variable so it skips that statement entirely). Next statement is the printf statement and here it can use the fact that it knows x to be 11 and pass that directly to printf. By the way it also realizes that it doesn't get to use the storage of x at -4(%ebp). Finally it may know what printf does (since you included stdio.h) so can analyze the format string and do the conversion at compile time to replace the printf statement to a call that directly writes 14 to standard out.
There is a code fragment that GCC produce the result I didn't expect:
(I am using gcc version 4.6.1 Ubuntu/Linaro 4.6.1-9ubuntu3 for target i686-linux-gnu)
[test.c]
#include <stdio.h>
int *ptr;
int f(void)
{
(*ptr)++;
return 1;
}
int main()
{
int a = 1, b = 2;
ptr = &b;
a = b++ + f() + f() ? b : a;
printf ("b = %d\n", b);
return a;
}
In my understanding, there is a sequence point at function call.
The post-increment should be taken place before f().
see C99 5.1.2.3:
"... called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall
have taken place."
For this test case, perhaps the order of evaluation is unspecified,
but the final result should be the same. So I expect b's final result is 5.
However, after compiling this case with 'gcc test.c -std=c99', the output shows b = 3.
Then I use "gcc test.c -std=c99 -S" to see what happened:
movl $1, 28(%esp)
movl $2, 24(%esp)
leal 24(%esp), %eax
movl %eax, ptr
movl 24(%esp), %ebx
call f
leal (%ebx,%eax), %esi
call f
addl %esi, %eax
testl %eax, %eax
setne %al
leal 1(%ebx), %edx
movl %edx, 24(%esp)
testb %al, %al
je .L3
movl 24(%esp), %eax
jmp .L4
.L3:
movl 28(%esp), %eax
.L4:
movl %eax, 28(%esp)
It seems that GCC uses evaluated value before f() and perform '++'
operation after two f() calls.
I also use llvm-clang to compile this case,
and the result shows b = 5, which is what I expect.
Is my understanding incorrect on post-increment and sequence point behavior ??
Or this is a known issue of GCC461 ??
In addition to Clang, there are two other tools that you can use as reference: Frama-C's value analysis and KCC. I won't go into the details of how to install them or use them for this purpose, but they can be used to check the definedness of a C program—unlike a compiler, they are designed to tell you if the target program exhibits undefined behavior.
They have their rough edges, but they both think that b should definitely be 5 with no undefined behavior at the end of your program:
Mini:~/c-semantics $ dist/kcc ~/t.c
Mini:~/c-semantics $ ./a.out
b = 5
This is an even stronger argument than Clang thinking so (since if it was undefined behavior, Clang could still generate a program that prints b = 5).
Long story short, it looks like you have found a bug in that version of GCC. The next step is to check out the SVN to see if it's still present there.
I reported this GCC bug some time ago and it was fixed earlier this year. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48814