It is a common optimization to use conditional move (assembly cmov) to optimize the conditional expression ?: in C. However, the C standard says:
The first operand is evaluated; there is a sequence point between its evaluation and the
evaluation of the second or third operand (whichever is evaluated). The second operand
is evaluated only if the first compares unequal to 0; the third operand is evaluated only if
the first compares equal to 0; the result is the value of the second or third operand
(whichever is evaluated), converted to the type described below.110)
For example, the following C code
#include <stdio.h>
int main() {
int a, b;
scanf("%d %d", &a, &b);
int c= a > b ? a + 1 : 2 + b;
printf("%d", c);
return 0;
}
will generate optimized related asm code as follows:
call __isoc99_scanf
movl (%rsp), %esi
movl 4(%rsp), %ecx
movl $1, %edi
leal 2(%rcx), %eax
leal 1(%rsi), %edx
cmpl %ecx, %esi
movl $.LC1, %esi
cmovle %eax, %edx
xorl %eax, %eax
call __printf_chk
According to the standard, the conditional expression will only have one branch evaluated. But here both branches are evaluated, which is against the standard's semantics. Is this optimization against the C standard? Or do many compiler optimizations have something inconsistent with the language standard?
The optimization is legal, due to the "as-if rule", i.e. C11 5.1.2.3p6.
A conforming implementation is just required to produce a program that when run produces the same observable behaviour as the execution of the program using the abstract semantics would have produced. The rest of the standard just describes these abstract semantics.
What the compiled program does internally does not matter at all, the only thing that matters is that when the program ends it does not have any other observable behaviour, except reading the a and b and printing the value of a + 1 or b + 2 depending on which one a or bis greater, unless something occurs that causes the behaviour be undefined. (Bad input causes a, b be uninitialized and therefore accesses undefined; range error and signed overflow can occur too.) If undefined behaviour occurs, then all bets are off.
Since accesses to volatile variables must be evaluated strictly according to the abstract semantics, you can get rid of the conditional move by using volatile here:
#include <stdio.h>
int main() {
volatile int a, b;
scanf("%d %d", &a, &b);
int c = a > b ? a + 1 : 2 + b;
printf("%d", c);
return 0;
}
compiles to
call __isoc99_scanf#PLT
movl (%rsp), %edx
movl 4(%rsp), %eax
cmpl %eax, %edx
jg .L7
movl 4(%rsp), %edx
addl $2, %edx
.L3:
leaq .LC1(%rip), %rsi
xorl %eax, %eax
movl $1, %edi
call __printf_chk#PLT
[...]
.L7:
.cfi_restore_state
movl (%rsp), %edx
addl $1, %edx
jmp .L3
by my GCC Ubuntu 7.2.0-8ubuntu3.2
The C Standard describes an abstract machine executing C code. A compiler is free to perform any optimization as long as that abstraction is not violated, i.e. a conforming program cannot tell the difference.
Related
is there any difference between n/=10 and n=n/10 in execution speed wise?
just like n-- and --n are differ in their execution speed wise also...
No, not really:
[C99: 6.5.16.2/3]: A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.
So, this has consequences only if your n is a non-trivial expression with side-effects (such as a function call).
Otherwise, I suppose in theory an intermediate temporary variable will be involved, but you'd have to be remarkably unlucky for such a temporary to actually survive in your compiled executable. You're not going to see any performance difference between the two approaches.
Confirm this with benchmarks, and by comparing the resulting assembly.
Given the following C-code:
int f1(int n) {
n /= 10;
return n;
}
int f2(int n) {
n = n / 10;
return n;
}
compiled with gcc -O4 essentially results in
f1:
movl %edi, %eax
movl $1717986919, %edx
sarl $31, %edi
imull %edx
sarl $2, %edx
subl %edi, %edx
movl %edx, %eax
ret
f2:
movl %edi, %eax
movl $1717986919, %edx
sarl $31, %edi
imull %edx
sarl $2, %edx
subl %edi, %edx
movl %edx, %eax
ret
I have omitted some boilerplate which is part of the listing in reality.
In this specific case, there is no difference between the two alternatives.
Depending on the compiler used, on the actual environment where the instructions are executed and on compiler optimization levels, the generated code might be different. But you can always use this approach to check if the resulting machine code differs or not.
There is no difference b/w the both.
I have checked it in the KEIL cross compiler for both expressions and same execution time required:
=================================================
5: x=x/5;
6:
C:0x0005 EF **MOV A,R7**
C:0x0006 75F005 **MOV B(0xF0),#0x05**
C:0x0009 84 **DIV AB**
7: x/=5;
C:0x000A 75F005 **MOV B(0xF0),#0x05**
C:0x000D 84 **DIV AB**
C:0x000E FF **MOV R7,A**
================================================
So, there is no any difference, like with --n and n--.
This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 9 years ago.
#include<stdio.h>
#define CUBE(x) (x*x*x)
int main()
{
int a, b=3;
a = CUBE(++b);
printf("%d, %d\n", a, b);
return 0;
}
This code returns the value of a=150 and b=6. Please explain this.
I think when it executes the value of a will be calculated as a=4*5*6=120 but it isn't true according to the compiler , so please explain the logic....
There's no logic, it's undefined behavior because
++b * ++b * ++b;
modifies and reads b 3 times with no interleaving sequence points.
Bonus: You'll see another weird behavior if you try CUBE(1+2).
In addition to what Luchian Grigore said (which explains why you observe this weird behavior) you should notice that this macro is horrible: it can cause subtle and very difficult to track-down bugs, especially when called with a statement that has side-effects (like ++b) since this will cause the statement to execute multiple times.
You should learn three thing from this:
Never reference a macro argument more than once in a macro. While there are exceptions to this rule, you should prefer to think of it as absolute.
Try to avoid calling macros with statements that contain side-effects if possible.
Try to avoid function-like macros when possible. Use inline functions instead.
Its undefined behavior to change same variable more then once in a sequence. And because this reason you will get difference results with different compilers for your code.
By the chance I am also getting same result a = 150 and b = 6 with my compiler.
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
Your macro expression a = CUBE(++b); expanses as
a = ++b * ++b * ++b;
And b is change more then once before end of full expression..
But how my compiler convert this expression at low level (may be your compiler do similarly and you can try with same technique). for this I compiled the source C with -S option and I got an assembly code.
gcc x.c -S
you will get x.s file.
I am showing partial useful asm code (read comments)
Because you want to know how does 150 output, thats why I am adding my answer
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $3, -8(%rbp) // b = 3
addl $1, -8(%rbp) // b = 4
addl $1, -8(%rbp) // b = 5
movl -8(%rbp), %eax // eax = b
imull -8(%rbp), %eax // 5*5 = 25
addl $1, -8(%rbp) // 6 `b` become 6 and
imull -8(%rbp), %eax // 6 * 25 = 150
movl %eax, -4(%rbp) // 150 assign to `a` become 150
movl $.LC0, %eax // printf function stuff...
movl -8(%rbp), %edx
movl -4(%rbp), %ecx
movl %ecx, %esi
movq %rax, %rdi
On inspecting this assembly code I can understand it evaluate the expression like
a = 5 * 5 * 6 thus a becomes 150 and after three increments b becomes 6.
Although different compilers produce different result but I think, 150 cab only be evaluated this sequence for b=3 and your expression in 5*5*6
There is a code fragment that GCC produce the result I didn't expect:
(I am using gcc version 4.6.1 Ubuntu/Linaro 4.6.1-9ubuntu3 for target i686-linux-gnu)
[test.c]
#include <stdio.h>
int *ptr;
int f(void)
{
(*ptr)++;
return 1;
}
int main()
{
int a = 1, b = 2;
ptr = &b;
a = b++ + f() + f() ? b : a;
printf ("b = %d\n", b);
return a;
}
In my understanding, there is a sequence point at function call.
The post-increment should be taken place before f().
see C99 5.1.2.3:
"... called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall
have taken place."
For this test case, perhaps the order of evaluation is unspecified,
but the final result should be the same. So I expect b's final result is 5.
However, after compiling this case with 'gcc test.c -std=c99', the output shows b = 3.
Then I use "gcc test.c -std=c99 -S" to see what happened:
movl $1, 28(%esp)
movl $2, 24(%esp)
leal 24(%esp), %eax
movl %eax, ptr
movl 24(%esp), %ebx
call f
leal (%ebx,%eax), %esi
call f
addl %esi, %eax
testl %eax, %eax
setne %al
leal 1(%ebx), %edx
movl %edx, 24(%esp)
testb %al, %al
je .L3
movl 24(%esp), %eax
jmp .L4
.L3:
movl 28(%esp), %eax
.L4:
movl %eax, 28(%esp)
It seems that GCC uses evaluated value before f() and perform '++'
operation after two f() calls.
I also use llvm-clang to compile this case,
and the result shows b = 5, which is what I expect.
Is my understanding incorrect on post-increment and sequence point behavior ??
Or this is a known issue of GCC461 ??
In addition to Clang, there are two other tools that you can use as reference: Frama-C's value analysis and KCC. I won't go into the details of how to install them or use them for this purpose, but they can be used to check the definedness of a C program—unlike a compiler, they are designed to tell you if the target program exhibits undefined behavior.
They have their rough edges, but they both think that b should definitely be 5 with no undefined behavior at the end of your program:
Mini:~/c-semantics $ dist/kcc ~/t.c
Mini:~/c-semantics $ ./a.out
b = 5
This is an even stronger argument than Clang thinking so (since if it was undefined behavior, Clang could still generate a program that prints b = 5).
Long story short, it looks like you have found a bug in that version of GCC. The next step is to check out the SVN to see if it's still present there.
I reported this GCC bug some time ago and it was fixed earlier this year. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48814
I have to translate this C code to assembly code:
#include <stdio.h>
int main(){
int a, b,c;
scanf("%d",&a);
scanf("%d",&b);
if (a == b){
b++;
}
if (a > b){
c = a;
a = b;
b = c;
}
printf("%d\n",b-a);
return 0;
}
My code is below, and incomplete.
rdint %eax # reading a
rdint %ebx # reading b
irmovl $1, %edi
subl %eax,%ebx
addl %ebx, %edi
je Equal
irmov1 %eax, %efx #flagged as invalid line
irmov1 %ebx, %egx
irmov1 %ecx, %ehx
irmovl $0, %eax
irmovl $0, %ebx
irmovl $0, %ecx
addl %eax, %efx #flagged as invalid line
addl %ebx, %egx
addl %ecx, %ehx
halt
Basically I think it is mostly done, but I have commented next to two lines flagged as invalid when I try to run it, but I'm not sure why they are invalid. I'm also not sure how to do an if statment for a > b. I could use any suggestions from people who know about y86 assembly language.
From what I can find online (1, 2), the only supported registers are: eax, ecx, edx, ebx, esi, edi, esp, and ebp.
You are requesting non-existent registers (efx and further).
Also irmov is for moving an immediate operand (read: constant numerical value) into a register operand, whereas your irmov1 %eax, %efx has two register operands.
Finally, in computer software there's a huge difference between the character representing digit "one" and the character representing letter "L". Mind your 1's and l's. I mean irmov1 vs irmovl.
Jens,
First, Y86 does not have any efx, egx, and ehx registers, which is why you are getting the invalid lines when you pour the code through YAS.
Second, you make conditional branches by subtracting two registers using the subl instruction and jumping on the condition code set by the Y86 ALU by ways of the jxx instructions.
Check my blog at http://y86tutoring.wordpress.com for details.
why is the output 44 instead of 12, &x,eip,ebp,&j should be the stack layout for the code
given below i guess, if so then it must have 12 as output, but i am getting it 44?
so help me understand the concept,altough the base pointer is relocated for every instant of execution, the relative must remain unchanged and should not it be 12?
int main() {
int x = 9;
// printf("%p is the address of x\n",&x);
fun(&x );
printf("%x is the address of x\n", (&x));
x = 3;
printf("%d x is \n",x);
return 0;
}
int fun(unsigned int *ptr) {
int j;
printf("the difference is %u\n",((unsigned int)ptr -(unsigned int) &j));
printf("the address of j is %x\n",&j);
return 0;
}
You're assuming that the compiler has packed everything (instead of putting things on specific alignment boundaries), and you're also assuming that the compiler hasn't inlined the function, or made any other optimisations or transformations.
In summary, you cannot make any assumptions about this sort of thing, or rely on any particular behaviour.
No, it should not be 12. It should not be anything. The ISO standard has very little to say about how things are arranged on the stack. An implementation has a great deal of leeway in moving things around and inserting padding for efficiency.
If you pass it through your compiler with an option to generate he assembler code (such as with gcc -S), it will become evident as to what's happening here.
fun:
pushl %ebp
movl %esp, %ebp
subl $40, %esp ;; this is quite large (see below).
movl 8(%ebp), %edx
leal -12(%ebp), %eax
movl %edx, %ecx
subl %eax, %ecx
movl %ecx, %eax
movl %eax, 4(%esp)
movl $.LC2, (%esp)
call printf
leal -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC3, (%esp)
call printf
movl $0, %eax
leave
ret
It appears that gcc is pre-generating the next stack frame (the one going down to printf) as part of the prolog for the fun function. That's in this case. Your compiler may be doing something totally different. But the bottom line is: the implementation can do what it likes as long as it doesn't violate the standard.
That's the code from optimisation level 0, by the way, and it gives a difference of 48. When I use gcc's insane optimisation level 3, I get a difference of 4. Again, perfectly acceptable, gcc usually does some pretty impressive optimisations at that level.