is there any difference between n/=10 and n=n/10 in execution speed wise?
just like n-- and --n are differ in their execution speed wise also...
No, not really:
[C99: 6.5.16.2/3]: A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.
So, this has consequences only if your n is a non-trivial expression with side-effects (such as a function call).
Otherwise, I suppose in theory an intermediate temporary variable will be involved, but you'd have to be remarkably unlucky for such a temporary to actually survive in your compiled executable. You're not going to see any performance difference between the two approaches.
Confirm this with benchmarks, and by comparing the resulting assembly.
Given the following C-code:
int f1(int n) {
n /= 10;
return n;
}
int f2(int n) {
n = n / 10;
return n;
}
compiled with gcc -O4 essentially results in
f1:
movl %edi, %eax
movl $1717986919, %edx
sarl $31, %edi
imull %edx
sarl $2, %edx
subl %edi, %edx
movl %edx, %eax
ret
f2:
movl %edi, %eax
movl $1717986919, %edx
sarl $31, %edi
imull %edx
sarl $2, %edx
subl %edi, %edx
movl %edx, %eax
ret
I have omitted some boilerplate which is part of the listing in reality.
In this specific case, there is no difference between the two alternatives.
Depending on the compiler used, on the actual environment where the instructions are executed and on compiler optimization levels, the generated code might be different. But you can always use this approach to check if the resulting machine code differs or not.
There is no difference b/w the both.
I have checked it in the KEIL cross compiler for both expressions and same execution time required:
=================================================
5: x=x/5;
6:
C:0x0005 EF **MOV A,R7**
C:0x0006 75F005 **MOV B(0xF0),#0x05**
C:0x0009 84 **DIV AB**
7: x/=5;
C:0x000A 75F005 **MOV B(0xF0),#0x05**
C:0x000D 84 **DIV AB**
C:0x000E FF **MOV R7,A**
================================================
So, there is no any difference, like with --n and n--.
Related
Suppose I have the following C code:
#include
int main()
{
int x = 11;
int y = x + 3;
printf("%d\n", x);
return 0;
}
Then I compile it into asm using gcc, I get this(with some flag removed):
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $11, -4(%rbp)
movl -4(%rbp), %eax
addl $3, %eax
movl %eax, -8(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
My problem is why it is movl -4(%rbp), %eax followed by movl %eax, %esi, rather than a simple movl -4(%rbp), %esi(which works well according to my experiment)?
You probably did not enable optimizations.
Without optimization the compiler will produce code like this. For one it does not allocate data to registers, but on the stack. This means that when you operate on variables they will first be transferred to a register and then operated on.
So given that x lives is allocated in -4(%rbp) and this is what the code appears as if you translate it directly without optimization. First you move 11 to the storage of x. This means:
movl $11, -4(%rbp)
done with the first statement. The next statement is to evaluate x+3 and place in the storage of y (which is -8(%rbp), this is done without regard of the previous generated code:
movl -4(%rbp), %eax
addl $3, %eax
movl %eax, -8(%rbp)
done with the second statement. By the way that is divided into two parts: evaluation of x+3 and the storage of the result. Then the compiler continues to generate code for the printf statement, again without taking earlier statements into account.
If you on the other hand would enable optimization the compiler does a number of smart and to humans obvious things. One thing is that it allows variables to be allocated to registers, or at least keep track on where one can find the value of the variable. In this case the compiler would for example know in the second statement that x is not only stored at -4(%ebp) it will also know that it is stored in $11 (yes it nows it's actual value). It can then use this to add 3 to it which means it knows the result to be 14 (but it's smarter that that - it has also seen that you didn't use that variable so it skips that statement entirely). Next statement is the printf statement and here it can use the fact that it knows x to be 11 and pass that directly to printf. By the way it also realizes that it doesn't get to use the storage of x at -4(%ebp). Finally it may know what printf does (since you included stdio.h) so can analyze the format string and do the conversion at compile time to replace the printf statement to a call that directly writes 14 to standard out.
The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to
be equivalent to long long.
gcc generates the following assembly code implementing the computation:
typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y)
{
*dest = x*y;
}
dest at %ebp+8, x at %ebp+12, y at %ebp+16
1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)
This code uses three multiplications to implement the multi precision arithmetic required to implement 64-bit arithmetic
on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how
it realizes your algorithm.
Question: What does line 5 do? what value is it moving to register ecx?
also what does line 11 do ?
Line 5: it's copying the value of some local variable to ECX. The value is unkown as of this listing, as we lack part of the original function code.
Line 11: it's equivalent to: EDX = EDX+ECX. The LEA instruction is used to compute the EA of a memory value and store that EA into a destination register, thus, it can be used to quickly do additions and constant multiplication.
I just compiled the following C code to test out the gcc optimizer (using the -O3 flag), expecting that both functions would end up generating the same set of assembly instructions:
int test1(int a, int b)
{
#define x (a*a*a+b)
#define y (a*b*a+3*b)
return x*x+x*y+y;
#undef x
#undef y
}
int test2(int a, int b)
{
int x = a*a*a+b;
int y = a*b*a+3*b;
return x*x+x*y+y;
}
But I was surprised to find that they generated slightly different assembly, and that the execution time for test1 (the code using the preprocessor instead of local variables) was a bit faster.
I've heard people say that the compiler can optimize better than humans can, and that you should tell it exactly what you want it to do; man I guess they weren't kidding. I thought the compiler was supposed to kind of guess at the programmer's intended use of local variables and replace their use if necessary... is that a false assumption?
When writing code for performance, are you better off using preprocessor definitions for the sake of readability rather than local variables? I know it looks ugly as hell, but apparently it actually makes a difference, unless I'm missing something.
Here's the assembly I got, using "gcc test.c -O3 -S". My gcc version is 4.8.2; it looks like the assembly output is the same for most versions of gcc, but not on 4.7 or 4.8 versions for some reason
test1
movl %edi, %eax
movl %edi, %edx
leal (%rsi,%rsi,2), %ecx
imull %edi, %eax
imull %esi, %edx
imull %edi, %eax
imull %edi, %edx
addl %esi, %eax
addl %ecx, %edx
leal (%rax,%rdx), %ecx
imull %ecx, %eax
addl %edx, %eax
ret
test2
movl %edi, %eax
leal (%rsi,%rsi,2), %edx
imull %edi, %eax
imull %edi, %eax
leal (%rax,%rsi), %ecx
movl %edi, %eax
imull %esi, %eax
imull %edi, %eax
addl %eax, %edx
leal (%rcx,%rdx), %eax
imull %ecx, %eax
addl %edx, %eax
ret
Trying your code at godbolt I get identical assembly for both functions with GCC, even with -O setting. Only by omitting -O flag I get different results. And this really is expected because the code is trivial to optimize.
Here is generated assembly using gcc 4.4.7 with -O flag. As you can see they are identical.
test1(int, int):
movl %edi, %eax
imull %edi, %eax
imull %eax, %edi
addl $3, %eax
imull %esi, %eax
addl %esi, %edi
leal (%rax,%rdi), %edx
imull %edi, %edx
leal (%rdx,%rax), %eax
ret
test2(int, int):
movl %edi, %eax
imull %edi, %eax
imull %eax, %edi
addl $3, %eax
imull %esi, %eax
addl %esi, %edi
leal (%rax,%rdi), %edx
imull %edi, %edx
leal (%rdx,%rax), %eax
ret
The answer is twofold:
Your statement about identical results is a misconception
I cannot reproduce your results "test1 faster than test2".
Preprocessor misconception
The results should not be identical. The preprocessor acts on (transforms) the source before it is actually compiled by the compiler with whatever options.
You can inspect the result of the preprocessor by running gcc -E main.c for example, assuming you are using a GNU compiler and your sources above are stored in a file main.c. The relevant parts become:
int test1(int a, int b)
{
return (a*a*a+b)*(a*a*a+b)+(a*a*a+b)*(a*b*a+3*b)+(a*b*a+3*b);
}
int test2(int a, int b)
{
int x = a*a*a+b;
int y = a*b*a+3*b;
return x*x+x*y+y;
}
Obviously, the first version uses roughly two times more mathematical operations than the second one. Then the compiler and its optimiser come into play …
(NB: Ideally you could analyse the number of CPU cycles generated by the assembler code. Use e.g. gcc -S main.c and look at main.s; you probably know that. Version 2 should "win" in that case.)
Runtime testing and optimising
In order to compare our results, you should post your test code. When testing you need to average out short term fluctuations and time granularity limits of your CPU. Hence you are likely to run in loops over the same code.
int i=100000000;
while (--i>0) {
int r;
r = test1(3, 4);
}
Without optimiser, test1 runs clearly about 20% slower than test2.
However, the optimiser will analyse also the calling code and can optimise away the multiple call with identical arguments or calls with unused variables (r in this case).
Therefore you must fool the compiler to effectively make the calls, alike
int r = 0;
while (--i>0) {
r += test1(3, i);
}
When I tried that, I get identical runtimes with a percent level precision. I.e. sometimes time1 is faster, sometimes time2 is faster, when I repeat the comparison several times.
You should look into the optimiser documentation to understand which optimising options you need to outsmart in your tests.
And I confirm what #Ville Krumlinde states: I get identical code for the assembly output, even with -O level optimisation (gcc 4.4.7 on my desktop). The code only contains 9 operations in assembler, which makes me believe that the optimiser "knows" enough about algebraic optimisation to simplify your formulas.
So you may just be taken by a fake optimiser effect of your test frame after all.
This question already has answers here:
if/else vs ternary operator
(4 answers)
Closed 9 years ago.
is there any performance or memory wise advantage of using ternary operator over if else (or vice versa)?
For example a case below:
int x=0, y=1, z=2, a=0;
a= x ? y : z;
alternative:
if ( x != 0 ){
a = y;
}else{
a = z;
}
If you look at the disassembly of both approaches, they're generally the same on any modern compiler I know of. The ternary operator is just a compact form of writing the same thing.
Here's an example using gcc 4.2.1 on Mac OS X:
With if/else:
int x = 1;
int y = 2;
int z;
if (x < y)
{
z = 3;
}
else
{
z = 4;
}
With the ternary operator:
int x = 1;
int y = 2;
int z = (x < y) ? 3 : 4;
If you run gcc -S test.c on both of these, you get this assembly for the if/else version:
movl $1, -16(%rbp)
movl $2, -20(%rbp)
movl -16(%rbp), %eax
movl -20(%rbp), %ecx
cmpl %ecx, %eax
jge LBB1_2
movl $3, -12(%rbp)
jmp LBB1_3
LBB1_2:
movl $4, -12(%rbp)
and this for the ternary operator version:
movl $1, -12(%rbp)
movl $2, -16(%rbp)
movl -12(%rbp), %eax
movl -16(%rbp), %ecx
cmpl %ecx, %eax
jge LBB1_2
movl $3, -20(%rbp)
jmp LBB1_3
LBB1_2:
movl $4, -20(%rbp)
The register offsets are different, but functionally, the code does the same thing. It adds two literals to two different registers, then compares and jumps based on the result of the comparison.
Compilers are generally smart enough to optimize both into same instructions. It is better idea to use ternary operator without assuming compiler optimization.
On any modern compiler there is generally no difference between those two.
Therefore it is only a question of readabilty and maintainability of your code.
The only "advantage" is that you can use the ternary operator in an expression (eg. function arguments), making for terser code. using an if, you'd duplicate the full expression.
Use whichever is most readable in your particular circumstances.
Worry about efficiency only when you have measured that you have a performance problem.
In all likelihood, the compiler will generate the same code.
why is the output 44 instead of 12, &x,eip,ebp,&j should be the stack layout for the code
given below i guess, if so then it must have 12 as output, but i am getting it 44?
so help me understand the concept,altough the base pointer is relocated for every instant of execution, the relative must remain unchanged and should not it be 12?
int main() {
int x = 9;
// printf("%p is the address of x\n",&x);
fun(&x );
printf("%x is the address of x\n", (&x));
x = 3;
printf("%d x is \n",x);
return 0;
}
int fun(unsigned int *ptr) {
int j;
printf("the difference is %u\n",((unsigned int)ptr -(unsigned int) &j));
printf("the address of j is %x\n",&j);
return 0;
}
You're assuming that the compiler has packed everything (instead of putting things on specific alignment boundaries), and you're also assuming that the compiler hasn't inlined the function, or made any other optimisations or transformations.
In summary, you cannot make any assumptions about this sort of thing, or rely on any particular behaviour.
No, it should not be 12. It should not be anything. The ISO standard has very little to say about how things are arranged on the stack. An implementation has a great deal of leeway in moving things around and inserting padding for efficiency.
If you pass it through your compiler with an option to generate he assembler code (such as with gcc -S), it will become evident as to what's happening here.
fun:
pushl %ebp
movl %esp, %ebp
subl $40, %esp ;; this is quite large (see below).
movl 8(%ebp), %edx
leal -12(%ebp), %eax
movl %edx, %ecx
subl %eax, %ecx
movl %ecx, %eax
movl %eax, 4(%esp)
movl $.LC2, (%esp)
call printf
leal -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC3, (%esp)
call printf
movl $0, %eax
leave
ret
It appears that gcc is pre-generating the next stack frame (the one going down to printf) as part of the prolog for the fun function. That's in this case. Your compiler may be doing something totally different. But the bottom line is: the implementation can do what it likes as long as it doesn't violate the standard.
That's the code from optimisation level 0, by the way, and it gives a difference of 48. When I use gcc's insane optimisation level 3, I get a difference of 4. Again, perfectly acceptable, gcc usually does some pretty impressive optimisations at that level.