in C, i have this code piece:
int a;
a = 10 + 5 - 3
I want to ask: where is (10+5-3) stored at?
(As far as I know, a is located on stack, how about (10+5-3)? How does this rvalue get calculated?)
Typically, the r-value is "stored" within the program itself.
In other words, the compiler itself (before the program is ever run) computes the 10 + 5 - 3 value (it can do so since since it is all based on constant immediate values), and it emits the assembly code to store the result of this calculation in whatever l-value for the assignement (in this case, the variable named a, which the compiler probably knows as a relative address to a data segment origin of sorts).
The r-value, which has a value of 12 is therefore only found inside the binary of the program, within a assembly instruction that looks like
mov <some dest, typically DS-relative>, $0C
$0C is the "r-value".
If the r-value happened to be the result of a calculation that can only done at run-time, say if the underlying c code was: a = 17 * x; // x some run time var, the r-value would too be "stored" (or rather materialized) as a series of instructions within the program binary. The difference with the simple "mov dest, imm" above is that it would take several instructions to load the variable x in an accumulator, multiply by 17 and store the result at the address where the variable a is. It is possible that the compiler may "authorize itself" ;-) to use the stack for some intermediate result etc. but such would be
a) completely compiler dependent
b) transiant
c) and typically would only involve part of the r-value
it is therefore safe to say that the r-value is a compile-time concept which is encapsulated in parts of the program (not the data), and isn't stored anywhere but in the program binary.
In response to paxdiablo: the explanation offered above is indeed restrictive of the possibilities because the c standard effectively does not dictate anything of that nature. Never the less, most any r-value is eventually materialized, at least in part, by some instructions which sets things up so that the proper value, whether calculated (at run time) or immediate gets addressed properly.
Constants are probably simplified at compile time, so your question as literally posed may not help. But something like, say, i - j + k that does need to be computed at runtime from some variables, may be "stored" wherever the compiler likes, depending on the CPU architecture: the compiler will typically try to do its best to use registers, e.g.
LOAD AX, i
SUB AX, j
ADD AX, k
to compute such an expression "storing" it in the accumulator register AX, before assigning it to some memory location with STORE AX, dest or the like. I'd be pretty surprised if a modern optimizing compiler on an even semi-decent CPU architecture (yeah, x86 included!-) needed to spill registers to memory for any reasonably simple expression!
This is compiler dependent. Usually the value (12) will be calculated by the compiler. It is then stored in the code, typically as part of a load/move immediate assembly instruction.
The result of the computation in the RHS (right-hand-side) is computed by the compiler in a step that's called "constant propagation".
Then, it is stored as an operand of the assembly instruction moving the value into a
Here's a disassembly from MSVC:
int a;
a = 10 + 5 - 3;
0041338E mov dword ptr [a],0Ch
Where it stores it is actually totally up to the compiler. The standard does not dictate this behavior.
A typical place can be seen by actually compiling the code and looking at the assembler output:
int main (int argc, char *argv[]) {
int a;
a = 10 + 5 - 3;
return 0;
}
which produces:
.file "qq.c"
.def ___main;
.scl 2;
.type 32;
.endef
.text
.globl _main
.def _main;
.scl 2;
.type 32;
.endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
call __alloca
call ___main
movl $12, -4(%ebp) ;*****
movl $0, %eax
leave
ret
The relevant bit is marked ;***** and you can see that the value is created by the compiler and just inserted directly into a mov type instruction.
Note that it's only this simple because the expression is a constant value. As soon as you introduce non-constant values (like variables), the code becomes a little more complicated. That's because you have to look those variables up in memory (or they may already be in a register) and then manipulate the values at run-time, not compile-time.
As to how the compiler calculates what the value should be, that's to do with expression evaluation and is a whole other question :-)
Your question is based on an incorrect premise.
The defining property of lvalue in C is that it has a place in storage, i.e it is stored. This is what differentiates lvalue from rvalue. Rvalue is not stored anywhere. That's what makes it an rvalue. If it were stored, it would be lvalue by definition.
The terms "lvalue" and "rvalue" are used to bisect the world of expressions. That is, (10+5-3) is an expression that happens to be an rvalue (because you cannot apply the & operator to it -- in C++ the rules are more complicated). At runtime, there are no expressions, lvalues or rvalues. In particular, they aren't stored anywhere.
You were wondering where the value 12 was stored, but the value 12 is neither an lvalue nor an rvalue (as opposed to the expression 12 which would be an rvalue, but 12 does not appear in your program).
Related
I know a bit about Assembly. So let me first introduce the codes, then explain my way of thinking.
#This is the Assembly version.
pushq %rbp
movq %rsp, %rbp
movl $2, -4(%rbp)
movl $3, -8(%rbp)
movl $5, %eax
popq %rbp
ret
#This is the C version.
int twothree() {
int a = 2;
int b = 3;
return 2 + 3;
}
Alright, so the first thing that stares me is that we do not use the variables a and b as a + b. So they are unnecessary, we directly sum the integers. Yet, if computers were able to understand that, I guess it would be really scary. So, my question is that: How did this assembly code work without any addl or similar command? We directly move the Immediate (or constant) integer 5 to eax registrar.
Also, quick question. So what happens to the a and b variables after last two lines? Their position in stack (or maybe we can call the 'registrars' they used as a memory place) are free now as we use malloc + free. Is it true or at least logical? popq %rbp is the command for closing the stack I guess.
I am not an expert in Assembly as I said. So most of these thoughts are just thinking. Thanks!
The compiler saw that you were adding two numbers 2 + 3. And the compiler calculated that 2+3=5 and it put 5 in the assembly code. This is called "constant folding".
I guess you have optimization turned off in your compiler since it didn't delete the useless variables a and b. But constant folding is very easy for the compiler (unlike other kinds of optimization) and useful, so it seems that the compiler is doing it even when you don't turn on optimization.
As you figured out, the assembly code does not add 2 and 3 because there is no addl or similar command. It just does return 5;
There are no commands or codes in assembly programming. Instead, assembly code (uncountable) comprises instructions and directives.
Where do you see a use of malloc or free? These are functions for managing dynamic memory which isn't something your program uses. If either of these function were used, you'd have a call malloc or call free instruction in the code somewhere. The variables a and b are all in automatic storage, i.e. on the stack.
Now what happens in your code is that the compiler has performed constant folding to emit code as if you wrote
#This is the C version.
int twothree() {
int a = 2;
int b = 3;
return 5;
}
This is something the compiler does regardless of optimisation flags. So indeed, no addition is happening at run time. It was already performed during constant folding at compile time.
Also, quick question. So what happens to the a and b variables after last two lines? Their position in stack (or maybe we can call the 'registrars' they used as a memory place) are free now as we use malloc + free. Is it true or at least logical? popq %rbp is the command for closing the stack I guess.
The variables were stored in the red zone, a 128 byte region of memory below the stack pointer which is free for use as a scratch without having to explicitly allocate it. Thus, no code is needed to allocate or release storage for them.
Now, there is no such thing as “closing the stack.” The stack is a region in memory. The top of the stack is pointed to by the stack pointer rsp whereas the base pointer rbp points to the bottom of the current stack frame. You'll often see code like
push %rbp
mov %rsp, %rbp
...
pop %rbp
to establish and tear down a stack frame for the function at its beginning and end. Read an assembly tutorial for more details.
I know that variations of this question has been asked here multiple times, but I'm not asking what is the difference between the two. Just would like some help understanding the assembly behind both forms.
I think my question is more related to the whys than to the what of the difference.
I'm reading Prata's C Primer Plus and in the part dealing with the increment operator ++ and the difference between using i++ or ++i the author says that if the operator is used by itself, such as ego++; it doesn't matter which form we use.
If we look at the dissasembly of the following code (compiled with Xcode, Apple LLVM version 9.0.0 (clang-900.0.39.2)):
int main(void)
{
int a = 1, b = 1;
a++;
++b;
return 0;
}
we can see that indeed the form used doesn't matter, since the assembly code is the same for both (both variables would print out a 2 to the screen).
Initializaton of a and b:
0x100000f8d <+13>: movl $0x1, -0x8(%rbp)
0x100000f94 <+20>: movl $0x1, -0xc(%rbp)
Assembly for a++:
0x100000f9b <+27>: movl -0x8(%rbp), %ecx
0x100000f9e <+30>: addl $0x1, %ecx
0x100000fa1 <+33>: movl %ecx, -0x8(%rbp)
Assembly for ++b:
0x100000fa4 <+36>: movl -0xc(%rbp), %ecx
0x100000fa7 <+39>: addl $0x1, %ecx
0x100000faa <+42>: movl %ecx, -0xc(%rbp)
Then the author states that when the operator and its operand are part of a larger expression as, for example, in an assignment statement the use of prefix or postfix it does make a difference.
For example:
int main(void)
{
int a = 1, b = 1;
int c, d;
c = a++;
d = ++b;
return 0;
}
This would print 1 and 2 for c and b, respectively.
And:
Initialization of a and b:
0x100000f46 <+22>: movl $0x1, -0x8(%rbp)
0x100000f4d <+29>: movl $0x1, -0xc(%rbp)
Assembly for c = a++; :
0x100000f54 <+36>: movl -0x8(%rbp), %eax // eax = a = 1
0x100000f57 <+39>: movl %eax, %ecx // ecx = 1
0x100000f59 <+41>: addl $0x1, %ecx // ecx = 2
0x100000f5c <+44>: movl %ecx, -0x8(%rbp) // a = 2
0x100000f5f <+47>: movl %eax, -0x10(%rbp) // c = eax = 1
Assembly for d = ++b; :
0x100000f62 <+50>: movl -0xc(%rbp), %eax // eax = b = 1
0x100000f65 <+53>: addl $0x1, %eax // eax = 2
0x100000f68 <+56>: movl %eax, -0xc(%rbp) // b = eax = 2
0x100000f6b <+59>: movl %eax, -0x14(%rbp) // d = eax = 2
Clearly the assembly code is different for the assignments:
The form c = a++; includes the use of the registers eax and ecx. It uses ecx for performing the increment of a by 1, but uses eax for the assignment.
The form d = ++b; uses ecx for both the increment of b by 1 and the assignment.
My question is:
Why is that?
What determines that c = a++; requires two registers instead of just one (ecx for example)?
In the following statements:
a++;
++b;
neither of the evaluation of the expressions a++ and ++b is used. Here the compiler is actually only interested in the side effects of these operators (i.e.: incrementing the operand by one). In this context, both operators behave in the same way. So, it's no wonder that these statements result in the same assembly code.
However, in the following statements:
c = a++;
d = ++b;
the evaluation of the expressions a++ and ++b is relevant to the compiler because they have to be stored in c and d, respectively:
d = ++b;: b is incremented and the result of this increment assigned to d.
c = a++; : the value of a is first assigned to c and then a is incremented.
Therefore, these operators behave differently in this context. So, it would make sense to result in different assembly code, at least in the beginning, without more aggressive optimizations enabled.
A good compiler would replace this whole code with c = 1; d = 2;. And if those variables aren't used in turn, the whole program is one big NOP - there should be no machine code generated at all.
But you do get machine code, so you are not enabling the optimizer correctly. Discussing the efficiency of non-optimized C code is quite pointless.
Discussing a particular compiler's failure to optimize the code might be meaningful, if a specific compiler is mentioned. Which isn't the case here.
All this code shows is that your compiler isn't doing a good job, possibly because you didn't enable optimizations, and that's it. No other conclusions can be made. In particular, no meaningful discussion about the behavior of i++ versus ++i is possible.
Your test has flaws : the compiler optimized your code by replacing your value with what could be easily predicted.
The compiler can, and will, calculate the result in advance during compilation and avoid the use of 'jmp' instructions (jump to the the while each time condition is still true).
If you try this code:
int a = 0;
int i = 0;
while (i++ < 10)
{
a += i;
}
The assembly will not use a single jmp instruction.
It will directly assign value of ½ n (n + 1), here (0.5 * 10 * 6) = 30 to the register holding the value of 'a' variable
You would have the following assembly output:
mov eax, 30 ; a register
mov ecx, 10 ; i register, this line only if i is still used after.
Whether you write :
int i = 0;
while (i++ < 10)
{
...
}
or
int i = -1;
while (++i < 11)
{
...
}
will also result in the same assembly output.
If you had a much more complex code you would be able to witness differences in the assembly code.
a = ++i;
would translate into :
inc rcx ; increase i by 1, RCX holds the current value of both and i variables.
mov rax, rcx ; a = i;
and a = i++; into :
lea rax, [rcx+1] ; RAX now holds i, RCX now holds a.
mov rax, rcx ; a = i;
inc rcx ; increase i by 1
(edit: See comment below)
Both the expressions ++i and i++ have the effect of incrementing i. The difference is that ++i produces a result (a value stored somewhere, for example in a machine register, that can be used within other expressions) equal to the new value of i, whereas i++ produces a result equal to the original value of i.
So, assuming we start with i having a value of 2, the statement
b = ++i;
has the effect of setting both b and i equal to 3, whereas;
b = i++;
has the effect of setting b equal to 2 and i equal to 3.
In the first case, there is no need to keep track of the original value of i after incrementing i whereas in the second there is. One way of doing this is for the compiler to employ an additional register for i++ compared with ++i.
This is not needed for a trivial expression like
i++;
since the compiler can immediately detect that the original value of i will not be used (i.e. is discarded).
For simple expressions like b = i++ the compiler could - in principle at least - avoid using an additional register, by simply storing the original value of i in b before incrementing i. However, in slightly more complex expressions such as
c = i++ - *p++; // p is a pointer
it can be much more difficult for the compiler to eliminate the need to store old and new values of i and p (unless, of course, the compiler looks ahead and determines how (or if) c, i, and p (and *p) are being used in subsequent code). In more complex expressions (involving multiple variables and interacting operations) the analysis needed can be significant.
It then comes down to implementation choices by developers/designers of the compiler. Practically, compiler vendors compete pretty heavily on compilation time (getting compilation times as small as possible) and, in doing so, may choose not to do all possible code transformations that remove unneeded uses of temporaries (or machine registers).
You compiled with optimization disabled! For gcc and LLVM, that means each C statement is compiled independently, so you can modify variables in memory with a debugger, and even jump to a different source line. To support this, the compiler can't optimize between C statements at all, and in fact spills / reloads everything between statements.
So the major flaw in your analysis is that you're looking at an asm implementation of that statement where the inputs and outputs are memory, not registers. This is totally unrealistic: compilers keep most "hot" values in registers inside inner loops, and don't need separate copies of a value just because it's assigned to multiple C variables.
Compilers generally (and LLVM in particular, I think) transform the input program into an SSA (Static Single Assignment) internal representation. This is how they track data flow, not according to C variables. (This is why I said "hot values", not "hot variables". A loop induction variable might be totally optimized away into a pointer-increment / compare against end_pointer in a loop over arr[i++]).
c = ++i; produces one value with 2 references to it (one for c, one for i). The result can stay in a single register. If it doesn't optimize into part of some other operation, the asm implementation could be as simple as inc %ecx, with the compiler just using ecx/rcx everywhere that c or i is read before the next modification of either. If the next modification of c can't be done non-destructively (e.g. with a copy-and-modify like lea (,%rcx,4), %edx or shrx %eax, %ecx, %edx), then a mov instruction to copy the register will be emitted.
d = b++; produces one new value, and makes d a reference to the old value of b. It's syntactic sugar for d=b; b+=1;, and compiles into SSA the same as that would. x86 has a copy-and-add instruction, called lea. The compiler doesn't care which register holds which value (except in loops, especially without unrolling, when the end of the loop has to have values in the right registers to jump to the beginning of the loop). But other than that, the compiler can do lea 1(%rbx), %edx to leave %ebx unmodified and make EDX hold the incremented value.
An additional minor flaw in your test is that with optimization disabled, the compiler is trying to compile quickly, not well, so it doesn't look for all possible peephole optimizations even within the statement that it does allow itself to optimize.
If the value of c or d is never read, then it's the same as if you had never done the assignment in the first place. (In un-optimized code, every value is implicitly read by the memory barrier between statements.)
What determines that c = a++; requires two registers instead of just one (ecx for example)?
The surrounding code, as always. +1 can be optimized into other operations, e.g. done with an LEA as part of a shift and/or add. Or built in to an addressing mode.
Or before/after negation, use the 2's complement identity that -x == ~x+1, and use NOT instead of NEG. (Although often you're adding the negated value to something, so it turns into a SUB instead of NEG + ADD, so there isn't a stand-alone NEG you can turn into a NOT.)
++ prefix or postfix is too simple to look at on its own; you always have to consider where the input comes from (does the incremented value have to end up back in memory right away or eventually?) and how the incremented and original values are used.
Basically, un-optimized code is un-interesting. Look at optimized code for short functions. See Matt Godbolt's talk at CppCon2017: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”, and also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler asm output.
Consider the two slightly different versions of the same code:
struct s
{
int dummy[1];
};
volatile struct s s;
int main(void)
{
s;
return 0;
}
and
struct s
{
int dummy[16];
};
volatile struct s s;
int main(void)
{
s;
return 0;
}
Here's what I'm getting with gcc 4.6.2 for them:
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl _s, %eax
xorl %eax, %eax
leave
ret
.comm _s, 4, 2
and
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
xorl %eax, %eax
leave
ret
.comm _s, 64, 5
Please note the absence of access to s in the second case.
Is it a compiler bug or am I just dealing with the following statement of the C standard and the gcc developers simply chose such a weird implementation-definedness and are still playing by the rules?:
What constitutes an access to an object that has volatile-qualified type is implementation-defined.
What would be the reason for this difference? I'd naturally expect the whole structre being accessed (or not accessed, I'm not sure), irrespective of its size and of what's inside it.
P.S. What does your compiler (non-gcc or newer gcc) do in this case? (please answer this last question in a comment if that's the only part you're going to address, as this isn't the main question being asked, but more of a curiosity question).
There is a difference between C and C++ for this question which explains what's going on.
clang-3.4
When compiling either of these snippets as C++, the emitted assembly didn't reference s in either case. In fact a warning was issued for both:
volatile.c:8:2: warning: expression result unused; assign into a variable to force a volatile load [-Wunused-volatile-lvalue]
s;
These warnings were not issued when compiling in C99 mode. As mentioned in this blog post and this GCC wiki entry from the question comments, using s in this context causes an lvalue-to-rvalue conversion in C, but not in C++. This is confirmed by examining the Clang AST for C, as there is an ImplicitCastExpr from LvalueToRValue, which does not exist in the AST generated from C++. (The AST is not affected by the size of the struct).
A quick grep of the Clang source reveals this in the emission of aggregate expressions:
case CK_LValueToRValue:
// If we're loading from a volatile type, force the destination
// into existence.
if (E->getSubExpr()->getType().isVolatileQualified()) {
EnsureDest(E->getType());
return Visit(E->getSubExpr());
}
EnsureDest forces the emission of a stack slot, sized and typed for the expression. As the optimizers are not allowed to remove volatile accesses, they remain as a scalar load/store and a memcpy respectively in both the IR and output asm. This is the behavior I would expect, given the above.
gcc-4.8.2
Here, I observe the same behavior as in the question. However when I change the expression from s; to s.dummy;, the access does not appear in either version. I'm not familiar with the internals of gcc as I am with LLVM so I can't speculate why this would happen. But based on the above observations, I would say this is a compiler bug due to inconsistency.
I've written this simple C code
int main()
{
int calc = 2+2;
return 0;
}
And I want to see how that looks in assembly, so I compiled it using gcc
$ gcc -S -o asm.s test.c
And the result was ~65 lines (Mac OS X 10.8.3) and I only found these to be related:
Where do I look for my 2+2 in this code?
Edit:
One part of the question hasn't been addressed.
If %rbp, %rsp, %eax are variables, what values do they attain in this case?
Almost all of the code you got is just useless stack manipulation. With optimization on (gcc -S -O2 test.c) you will get something like
main:
.LFB0:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE0:
Ignore every line that starts with a dot or ends with a colon: there are only two assembly instructions:
xorl %eax, %eax
ret
and they encode return 0;. (XORing a register with itself sets it to all-bits-zero. Function return values go in register %eax per the x86 ABI.) Everything to do with your int calc = 2+2; has been discarded as unused.
If you changed your code to
int main(void) { return 2+2; }
you would instead get
movl $4, %eax
ret
where the 4 comes from the compiler doing the addition itself rather than making the generated program do it (this is called constant folding).
Perhaps more interesting is if you change the code to
int main(int argc, char **argv) { return argc + 2; }
then you get
leal 2(%rdi), %eax
ret
which is doing some real work at runtime! In the 64-bit ELF ABI, %rdi holds the first argument to the function, argc in this case. leal 2(%rdi), %eax is x86 assembly language for "%eax = %edi + 2" and it's being done this way mainly because the more familiar add instruction takes only two arguments, so you can't use it to add 2 to %rdi and put the result in %eax all in one instruction. (Ignore the difference between %rdi and %edi for now.)
The compiler determined that 2+2 = 4 and inlined it. The constant is stored in line 10 (the $4). To verify this, change the math to 2+3 and you will see $5
EDIT: as for the registers themselves, %rsp is the stack pointer, %rbp is the frame pointer, and %eax is a general register
Here is an explanation of the assembly code:
pushq %rbp
This saves a copy of the frame pointer on the stack. The function itself does not need this; it is there so that debuggers or exception handlers can find frames on the stack.
movq %rsp, %rbp
This starts a new frame by setting the frame pointer to point to the current top-of-stack. Again, the function does not need this; it is housekeeping to maintain a proper stack.
mov $4, -12(%rbp)
Here the compiler initializes calc to 4. Several things have happened here. First, the compiler evaluated 2+2 by itself and used the result, 4, in the assembly code. The arithmetic is not performed in the executing program; it was completed in the compiler. Second, calc has been assigned the location 12 bytes below the frame pointer. (This is interesting because it is also below the stack pointer. The OS X ABI for this architecture includes a “red zone” below the stack pointer that programs are permitted to use, which is unusual.) Third, the program was clearly compiled without optimization. We know that because the optimizer would recognize that this code has no effect and is useless, so it would remove it.
movl $0, -8(%rbp)
This code stores 0 in the place the compiler has set aside to prepare the return value of main.
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
This copies data from the place where the return value is prepared to a temporary handling location. This is even more useless than the previous code, reinforcing the conclusion that optimization was not used. This looks like code I would expect at a negative optimization level.
movl -4(%rbp), %eax
This moves the return value from the temporary handling location to the register in which it is returned to the caller.
popq %rbp
This restores the frame pointer, thus removing the previously-pushed frame from the stack.
ret
This puts the program out of its misery.
Your program has no observable behavior, which means that in general case the compiler might not generate any machine code for it at all, besides some minimal startup-wrapup instructions intended to ensure that zero is returned to the calling environment. At least declare your variable as volatile. Or print its value after evaluating it. Or return it from main.
Also note that in C language 2 + 2 qualifies as integral constant expression. This means that compiler is not just allowed, but actually required to know the result of that expression at compile time. Taking this into account, it would be strange to expect the compiler to evaluate 2 + 2 at run time when the final value is known at compile time (even if you completely disable optimizations).
The compiler optimized it away, it pre-computed the answer and just set the result. If you want to see the compiler do the add then you cannot let it "see" the constants you are feeding it
If you compile this code all by itself as an object (gcc -O2 -c test_add.c -o test_add.o)
then you will force the compiler to generate the add code. But the operands will be registers or on the stack.
int test_add ( int a, int b )
{
return(a+b);
}
Then if you call it from code in a separate source (gcc -O2 -c test.c -o test.o) then you will see the two operands be forced into the function.
extern int test_add ( int, int );
int test ( void )
{
return(test_add(2,2));
}
and you can disassemble both of those objects (objdump -D test.o, objdump -D test_add.o)
When you do something that simple in one file
int main ( void )
{
int a,b,c;
a=2;
b=2;
c=a+b;
return(0);
}
The compiler can optimize your code into one of a few equivalents. My example here, does nothing, the math and results have no purpose, they are not used, so they can simply be removed as dead code. Your opitmization did this
int main ( void )
{
int c;
c=4;
return(0);
}
But this is also a perfectly valid optimization of the above code
int main ( void )
{
return(0);
}
EDIT:
Where is the calc=2+2?
I believe the
movl $4,-12(%rbp)
Is the 2+2 (the answer is computed and simply placed in calc which is on the stack.
movl $0,-8(%rbp)
I assume is the 0 in your return(0);
The actual math of adding two numbers was optimized out.
I guess line 10, he optimzed since all are constants
I have code that is doing a lot of these comparison operations. I was wondering which is the most efficient one to use. Is it likely that the compiler will correct it if I intentionally choose the "wrong" one?
int a, b;
// Assign a value to a and b.
// Now check whether either is zero.
// The worst?
if (a * b == 0) // ...
// The best?
if (a & b == 0) // ...
// The most obvious?
if (a == 0 || b == 0) // ...
Other ideas?
In general, if there's a fast way of doing a simple thing, you can assume the compiler will do it that fast way. And remember that the compiler is outputting machine language, not C -- the fastest method probably can't be correctly represented as a set of C constructs.
Also, the third method there is the only one that always works. The first one fails if a and b are 1<<16, and the second you already know doesn't work.
It's possible to see which variant generates fewer assembly instructions, but it's a separate matter to see which one actually executes in less time.
To help you analyze the first matter, learn to use your C compiler's command-line flags to capture its intermediate output. GCC is a common choice for a C compiler. Let's look at its unoptimized assembly code for two different programs.
#include <stdio.h>
void report_either_zero()
{
int a = 1;
int b = 0;
if (a == 0 || b == 0)
{
puts("One of them is zero.");
}
}
Save that text to a file such as zero-test.c, and run the following command:
gcc -S zero-test.c
GCC will emit a file called zero-test.s, which is the assembly code it would normally submit to the assembler as it generates object code.
Let's look at the relevant fragment of the assembly code. I'm using gcc version 4.2.1 on Mac OS X generating x86 64-bit instructions.
_report_either_zero:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $1, -20(%rbp) // a = 1
movl $0, -24(%rbp) // b = 0
movl -24(%rbp), %eax // Get ready to compare a.
cmpl $0, %eax // Does zero equal a?
je LBB1_2 // If so, go to label LBB1_2.
movl -24(%rbp), %eax // Otherwise, get ready to compare b.
cmpl $0, %eax // Does zero equal b?
jne LBB1_3 // If not, go to label LBB1_3.
LBB1_2:
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts // Otherwise, write the string to standard output.
LBB1_3:
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:
You can see where the we load the integer values 1 and 0 into registers, then prepare to compare the first to zero, and then again with the second if the first is nonzero.
Now let's try a different approach with the comparison, to see how the assembly code changes. Note that this is not the same predicate; this one checks whether both numbers are zero.
#include <stdio.h>
void report_both_zero()
{
int a = 1;
int b = 0;
if (!(a | b))
{
puts("Both of them are zero.");
}
}
The assembly code is a little different:
_report_both_zero:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $16, %rsp
Ltmp2:
movl $1, -4(%rbp) // a = 1
movl $0, -8(%rbp) // b = 0
movl -4(%rbp), %eax // Get ready to operate on a.
movl -8(%rbp), %ecx // Get ready to operate on b too.
orl %ecx, %eax // Combine a and b via bitwise OR.
cmpl $0, %eax // Does zero equal the result?
jne LBB1_2 // If not, go to label LBB1_2.
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts // Otherwise, write the string to standard output.
LBB1_2:
addq $16, %rsp
popq %rbp
ret
Leh_func_end1:
If the first number is zero, the first variant does less work—in terms of the number of assembly instructions involved—by avoiding a second register move. If the first number is not zero, the second variant does less work by avoiding a second comparison to zero.
The question now is whether "move, move, bitwise or, compare" runs faster that "move, compare, move, compare." The answer could come down to things like whether the processor learns to predict how often the first integer is zero, and whether it is or not consistently.
If you ask the compiler to optimize this code, the example is too simple; the compiler decides at compile time that no comparison is necessary, and just condenses that code to an unconditional request to write the string. It's interesting to change the code to operate on parameters rather than constants, and see how the optimizer handles the situation differently.
Variant one:
#include <stdio.h>
void report_either_zero(int a, int b)
{
if (a == 0 || b == 0)
{
puts("One of them is zero.");
}
}
Variant two (again, a different predicate):
#include <stdio.h>
void report_both_zero(int a, int b)
{
if (!(a | b))
{
puts("Both of them are zero.");
}
}
Generate the optimized assembly code with this command:
gcc -O -S zero-test.c
Let us know what you find.
This will not likely have much (if any, given modern compiler optimizers) effect on the overall performance of your app. If you really must know, you should write some code to test the performance of each for your compiler. However, as a best guess, I'd say...
if ( !( a && b ) )
This will short-circuit if the first happens to be 0.
If you want to find whether or not one of two integers is zero using one comparison instruction ...
if ((a << b) == a)
If a is zero, then no amount of shifting it to the left will change its value.
If b is zero, then there is no shifting performed.
It is possible (I am too lazy to check) that there is some undefined behaviour should b be negative or really large.
However, due to the non-intuitiveness, it would be strongly recommended to implement this as a macro (with an appropriate comment).
Hope this helps.
The most efficient is certainly the most obvious, if by efficiency, you are measuring the programmer's time.
If by measuring efficiency using the processor's time, profiling your candidate solution is the best way to answer - for the target machine you profiled.
But this exercise demonstrated a pitfall of programmer optimization. The 3 candidates are not functionally equivalent for all int.
If you was a functional equivalent alternative...
I think the last candidate and a 4th one deserve comparison.
if ((a == 0) || (b == 0))
if ((a == 0) | (b == 0))
Due to the variation of compilers, optimization and CPU branch prediction, one should profile, rather than pontificate, to determine relative performance. OTOH, a good optimizing compiler may give you the same code for both.
I recommend the code that is easiest to maintain.
There's no "most efficient way to do it in C", if by "efficiency" you mean the efficiency of the compiled code.
Firstly, even if we assume that the compiler translates C language operator into their "obvious" machine counterparts (i.e. C multiplication into machine multiplication etc) the efficiency of each method will differ from one hardware platform to the other. Even if we restrict our consideration to a very specific sequence of instructions on a very specific hardware platform, it still can exhibit different performance in different surrounding contexts, depending, for example, on how well the whole thing agrees with the branch prediction heuristic in the given CPU.
Secondly, modern C compilers rarely translate C operators into their "obvious" machine counterparts. Often the instructions used in machine code will have very little in common with the C code. It is possible that many "completely different" methods of performing the check at C level will actually be translated into the same sequence of machine instructions by a smart compiler. At the same time the same C code might get translated into different sequences machine instructions when the surrounding contexts are different.
In other words, there's no meaningful answer to your question, unless you really really really localize it to a specific hardware platform, specific compiler version and specific set of compilation settings. And that will make it too localized to be useful.
That usually means that in general case the best way to do it is to write the most readable code. Just do
if (a == 0 || b == 0)
The readability of the code will not only help the human reader to understand it, but will also increase the probability of the compiler properly interpreting your intent and generating the most optimal code.
But if you really have to squeeze the last CPU cycle out of your performance-critical code, you have to try different versions and compare their relative efficiency manually.