Will compilers optimize double logical negation in conditionals? - c

Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.

This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.

Related

How to optimize "don't care" argument with gcc?

Sometimes a function doesn't use an argument (perhaps because another "flags" argument doesn't enable a specific feature).
However, you have to specify something, so usually you just put 0. But if you do that, and the function is external, gcc will emit code to "really make sure" that parameter gets set to 0.
Is there a way to tell gcc that a particular argument to a function doesn't matter and it can leave alone whatever value it is that happens to be in the argument register right then?
Update: Someone asked about the XY problem. The context behind this question is I want to implement a varargs function in x86_64 without using the compiler varargs support. This is simplest when the parameters are on the stack, so I declare my functions to take 5 or 6 dummy parameters first, so that the last non-vararg parameter and all of the vararg parameters end up on the stack. This works fine, except it's clearly not optimal - when looking at the assembly code it's clear that gcc is initializing all those argument registers to zero in the caller.
Please don't take below answer seriously. The question asks for a hack so there you go.
GCC will effectively treat value of uninitialized variable as "don't care" so we can try exploiting this:
int foo(int x, int y);
int bar_1(int y) {
int tmp = tmp; // Suppress uninitialized warnings
return foo(tmp, y);
}
Unfortunately my version of GCC still cowardly initializes tmp to zero but yours may be more aggressive:
bar_1:
.LFB0:
.cfi_startproc
movl %edi, %esi
xorl %edi, %edi
jmp foo
.cfi_endproc
Another option is (ab)using inline assembly to fake GCC into thinking that tmp is defined (when in fact it isn't):
int bar_2(int y) {
int tmp;
asm("" : "=r"(tmp));
return foo(tmp, y);
}
With this GCC managed to get rid of parameter initializations:
bar_2:
.LFB1:
.cfi_startproc
movl %edi, %esi
jmp foo
.cfi_endproc
Note that inline asm must be immediately before the function call, otherwise GCC will think it has to preserve output values which would harm register allocation.

difference between the function performance when passing parameter as compile time constant or variable

In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
where constant_test_bit and variable_test_bit are defined as:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
UPDATE: The following main function is used to test the performance:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
This correctly outputs the sentence j is a...
For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
I use gcc test.c -O1 to compile, and here is the result:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
constant, variable_test_bit( omit time ./a.out, same for the following ):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?
Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:
// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
int x = constant_test_bit(j, &i) ;
return x ;
}
// constant expression test case
int func2(unsigned long i)
{
int x = constant_test_bit(21, &i) ;
return x ;
}
We see the optimizer is able to optimize the constant expression case to the following:
shrq $21, %rax
andl $1, %eax
while the non-constant expression case ends up as follows:
sarl $5, %eax
andl $31, %ecx
cltq
leaq -8(%rsp,%rax,8), %rax
movq (%rax), %rax
shrq %cl, %rax
andl $1, %eax
So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:
btl %edi,8(%rsp)
sbbl %esi,%esi
for most cases.
As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:
movq (%rax), %rdi
even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.

Can A C Compiler Eliminate This Conditional Test At Runtime?

Let's say I have pseudocode like this:
main() {
BOOL b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(true) {
do_stuff(b);
}
}
do_stuff(BOOL b) {
if(b)
path_a();
else
path_b();
}
Now, since we know that the external environment can influence get_bool_from_environment() to potentially produce either a true or false result, then we know that the code for both the true and false branches of if(b) must be included in the binary. We can't simply omit path_a(); or path_b(); from the code.
BUT -- we only set BOOL b the one time, and we always reuse the same value after program initialization.
If I were to make this valid C code and then compile it using gcc -O0, the if(b) would be repeatedly evaluated on the processor each time that do_stuff(b) is invoked, which inserts what are, in my opinion, needless instructions into the pipeline for a branch that is basically static after initialization.
If I were to assume that I actually had a compiler that was as stupid as gcc -O0, I would re-write this code to include a function pointer, and two separate functions, do_stuff_a() and do_stuff_b(), which don't perform the if(b) test, but simply go ahead and perform one of the two paths. Then, in main(), I would assign the function pointer based on the value of b, and call that function in the loop. This eliminates the branch, though it admittedly adds a memory access for the function pointer dereference (due to architecture implementation I don't think I really need to worry about that).
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()? If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I realize that compilers can't generate dynamic code at runtime, and the only types of systems that could do that in principle would be bytecode virtual machines or interpreters (e.g. Java, .NET, Ruby, etc.) -- so the question remains whether or not it is possible to do this statically and generate code that contains both the path_a(); branch and the path_b() branch, but avoid evaluating the conditional test if(b) for every call of do_stuff(b);.
If you tell your compiler to optimise, you have a good chance that the if(b) is evaluated only once.
Slightly modifying the given example, using the standard _Bool instead of BOOL, and adding the missing return types and declarations,
_Bool get_bool_from_environment(void);
void path_a(void);
void path_b(void);
void do_stuff(_Bool b) {
if(b)
path_a();
else
path_b();
}
int main(void) {
_Bool b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(1) {
do_stuff(b);
}
}
the (relevant part of the) produced assembly by clang -O3 [clang-3.0] is
callq get_bool_from_environment
cmpb $1, %al
jne .LBB1_2
.align 16, 0x90
.LBB1_1: # %do_stuff.exit.backedge.us
# =>This Inner Loop Header: Depth=1
callq path_a
jmp .LBB1_1
.align 16, 0x90
.LBB1_2: # %do_stuff.exit.backedge
# =>This Inner Loop Header: Depth=1
callq path_b
jmp .LBB1_2
b is tested only once, and main jumps into an infinite loop of either path_a or path_b depending on the value of b. If path_a and path_b are small enough, they would be inlined (I strongly expect). With -O and -O2, the code produced by clang would evaluate b in each iteration of the loop.
gcc (4.6.2) behaves similarly with -O3:
call get_bool_from_environment
testb %al, %al
jne .L8
.p2align 4,,10
.p2align 3
.L9:
call path_b
.p2align 4,,6
jmp .L9
.L8:
.p2align 4,,8
call path_a
.p2align 4,,8
call path_a
.p2align 4,,5
jmp .L8
oddly, it unrolled the loop for path_a, but not for path_b. With -O2 or -O, it would however call do_stuff in the infinite loop.
Hence to
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()?
the answer is a definitive Yes, it is possible for compilers to recognize this and take advantage of that fact. Good compilers do when asked to optimise hard.
If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I don't know the name of the optimisation, but two implementations doing that are gcc and clang (at least, recent enough releases).

C/C++: is GOTO faster than WHILE and FOR?

I know, that everybody hates GOTO and nobody recommends it. But that's not the point. I just want to know, which code is the fastest:
the goto loop
int i=3;
loop:
printf("something");
if(--i) goto loop;
the while loop
int i=3;
while(i--) {
printf("something");
}
the for loop
for(int i=3; i; i--) {
printf("something");
}
Generally speaking, for and while loops get compiled to the same thing as goto, so it usually won't make a difference. If you have your doubts, you can feel free to try all three and see which takes longer. Odds are you'll be unable to measure a difference, even if you loop a billion times.
If you look at this answer, you'll see that the compiler can generate exactly the same code for for, while, and goto (only in this case there was no condition).
The only time I've seen the argument made for goto was in one of W. Richard Stevens' articles or books. His point was that in a very time-critical section of code (I believe his example was the network stack), having nested if/else blocks with related error-handling code could be redone using goto in a way that made a valuable difference.
Personally, I'm not good enough a programmer to argue with Stevens' work, so I won't try. goto can be useful for performance-related issues, but the limits of when that is so are fairly strict.
Write short programs, then do this:
gcc -S -O2 p1.c
gcc -S -O2 p2.c
gcc -S -O2 p3.c
Analyze the output and see if there's any difference. Be sure to introduce some level of unpredictability such that the compiler doesn't optimize the program away to nothing.
Compilers do a great job of optimizing these trivial concerns. I'd suggest not to worry about it, and instead focus on what makes you more productive as a programmer.
Speed and efficiency is a great thing to worry about it, but 99% of the time that involves using proper data structures and algorithms... not worrying about whether a for is faster than a while or a goto, etc.
It is probably both compiler, optimiser and architecture specific.
For example the code if(--i) goto loop; is a conditional test followed by an unconditional branch. A compiler might simply generate corresponding code or it might be smart enough (though a compiler that did not have at least that much smarts may not be worth much), to generate a single conditional branch instruction. while(i--) on the other hand is already a conditional branch at the source level, so translation to a conditional branch at the machine level may be more likley regardless of the sophistication of the compiler implementation or optimiser.
In the end, the difference is likley to be minute and only relevant if a great many iterations are required, and the way you should answer this question is to build the code for the specific target and compiler (and compiler settings) of interest, and either inspect the resultant machine level code or directly measure execution time.
In your examples the printf() in the loop will dominate any timing in any case; something simpler in the loop would make observations of the differences easier. I would suggest an empty loop, and then declaring i volatile to prevent the loop being optimised to nothing.
As long as you're generating the same flow of control as a normal loop, pretty nearly any decent compiler can and will produce the same code whether you use for, while, etc. for it.
You can gain something from using goto, but usually only if you're generating a flow of control that a normal loop simply can't (at least cleanly). A typical example is jumping into the middle of a loop to get a loop and a half construct, which most languages' normal loop statements (including C's) don't provide cleanly.
There is should not be any significant difference between all the loops and the goto. Except the idea, that compiler more probably will not try to optimize the GOTO-things at all.
And there is not a lot of sense trying to optimize compiler-generated stuff in loops. It's more sense to optimize the code inside the loop, or reduce the number of iterations or so on.
I think there will be some code after compiler under nornal condition.
In fact I think goto is very convenient sometimes, although it is hard to read.
On Linux, I compiled the code below into assembly using both g++ and clang++. For more information on how I did that, see here. (Short version: g++ -S -O3 filename.cpp clang++ -S -O3 filename.cpp, and some assembly comments you'll see below to help me out.)
Conclusion/TL;DR at the bottom.
First, I compared label: and goto vs. do {} while. You can't compare a for () {} loop with this (in good faith), because a for loop always evaluates the condition first. This time around, the condition is evaluated only after the loop code has been executed once.
#include <iostream>
void testGoto()
{
__asm("//startTest");
int i = 0;
loop:
std::cout << i;
++i;
if (i < 100)
{
goto loop;
}
__asm("//endTest");
}
#include <iostream>
void testDoWhile()
{
__asm("//startTest");
int i = 0;
do
{
std::cout << i;
++i;
}
while (i < 100);
__asm("//endTest");
}
In both cases, the assembly is the exact same regardless of goto or do {} while, per compiler:
g++:
xorl %ebx, %ebx
leaq _ZSt4cout(%rip), %rbp
.p2align 4,,10
.p2align 3
.L2:
movl %ebx, %esi
movq %rbp, %rdi
addl $1, %ebx
call _ZNSolsEi#PLT
cmpl $100, %ebx
jne .L2
clang++:
xorl %ebx, %ebx
.p2align 4, 0x90
.LBB0_1: # =>This Inner Loop Header: Depth=1
movl $_ZSt4cout, %edi
movl %ebx, %esi
callq _ZNSolsEi
addl $1, %ebx
cmpl $100, %ebx
jne .LBB0_1
# %bb.2:
Then I compared label: and goto vs. while {} vs. for () {}. This time around, the condition is evaluated before the loop code has been executed even once.
For goto, I had to invert the condition, at least for the first time. I saw two ways of implementing it, so I tried both ways.
#include <iostream>
void testGoto1()
{
__asm("//startTest");
int i = 0;
loop:
if (i >= 100)
{
goto exitLoop;
}
std::cout << i;
++i;
goto loop;
exitLoop:
__asm("//endTest");
}
#include <iostream>
void testGoto2()
{
__asm("//startTest");
int i = 0;
if (i >= 100)
{
goto exitLoop;
}
loop:
std::cout << i;
++i;
if (i < 100)
{
goto loop;
}
exitLoop:
__asm("//endTest");
}
#include <iostream>
void testWhile()
{
__asm("//startTest");
int i = 0;
while (i < 100)
{
std::cout << i;
++i;
}
__asm("//endTest");
}
#include <iostream>
void testFor()
{
__asm("//startTest");
for (int i = 0; i < 100; ++i)
{
std::cout << i;
}
__asm("//endTest");
}
As above, in all four cases, the assembly is the exact same regardless of goto 1 or 2, while {}, or for () {}, per compiler, with just 1 tiny exception for g++ that may be meaningless:
g++:
xorl %ebx, %ebx
leaq _ZSt4cout(%rip), %rbp
.p2align 4,,10
.p2align 3
.L2:
movl %ebx, %esi
movq %rbp, %rdi
addl $1, %ebx
call _ZNSolsEi#PLT
cmpl $100, %ebx
jne .L2
Exception for g++: at the end of the goto2 assembly, the assembly added:
.L3:
endbr64
(I presume this extra label was optimized out of the goto 1's assembly.) I would assume that this is completely insignificant though.
clang++:
xorl %ebx, %ebx
.p2align 4, 0x90
.LBB0_1: # =>This Inner Loop Header: Depth=1
movl $_ZSt4cout, %edi
movl %ebx, %esi
callq _ZNSolsEi
addl $1, %ebx
cmpl $100, %ebx
jne .LBB0_1
# %bb.2:
In conclusion/TL;DR: No, there does not appear to be any difference whatsoever between any of the possible equivalent arrangements of label: and goto, do {} while, while {}, and for () {}, at least on Linux using g++ 9.3.0 and clang++ 10.0.0.
Note that I did not test break and continue here; however, given that the assembly code generated for each of the 4 in any scenario was the same, I can only presume that they would be the exact same for break and continue, especially since the assembly is using labels and jumps for every scenario.
To ensure correct results, I was very meticulous in my process and also used Visual Studio Code's compare files feature.
There are several niche verticals where goto is still commonly used as a standard practice, by some very very smart folks and there is no bias against goto in those settings. I used to work at a simulations focused company where all local fortran code had tons of gotos, the team was super smart, and the software worked near flawlessly.
So, we can leave the merit of goto aside, and if the question merely is to compare the loops, then we do so by profiling and/or comparing the assembly code. That said however, the question includes statements like printf etc. You can't really have a discussion about loop control logic optimization when doing that. Also, as others have pointed out, the given loops will all generate VERY similar machine codes.
All conditional branches are considered "taken" (true) in pipelined processor architectures anyway until decode phase, in addition to small loops being usually expanded to be loopless. So, in line with Harper's point above, it is unrealistic for goto to have any advantage whatsoever in simple loop control (just as for or while don't have an advantage over each other). GOTO makes sense usually in multiple nested loops or multiple nested ifs, when adding the additional condition checked by goto into EACH of the nested loops or nested ifs is suboptimal.
When optimizing a search kind of operation in a simple loop, using a sentinal is sometimes more effective than anything else. Essentially, by adding a dummy value at the end of the array, you can avoid checking for two conditions (end of array and value found) to be just one condition (value found), and that saves on cmp operations internally. I am unaware if compilers automatically do that or not.
goto Loop:
start_Chicken:
{
++x;
if (x >= loops)
goto end_Chicken;
}
goto start_Chicken;
end_Chicken:
x = 0;
for Loop:
for (int i = 0; i < loops; i++)
{
}
while Loop:
while (z <= loops)
{
++z;
}
z = 0;
Image from results
While loop in any situation with more mixed tests had as minimal but still better results.

How to deal with -Wconversion warnings from GCC?

I'm building my project with GCC's -Wconversion warning flag. (gcc (Debian 4.3.2-1.1) 4.3.2) on a 64bit GNU/Linux OS/Hardware. I'm finding it useful in identifying where I've mixed types or lost clarity as to which types should be used.
It's not so helpful in most of the other situations which activate it's warnings and I'm asking how am I meant to deal with these:
enum { A = 45, B, C }; /* fine */
char a = A; /* huh? seems to not warn about A being int. */
char b = a + 1; /* warning converting from int to char */
char c = B - 2; /* huh? ignores this *blatant* int too.*/
char d = (a > b ? b : c) /* warning converting from int to char */
Due to the unexpected results of the above tests (cases a and c) I'm also asking for these differences to be explained also.
Edit: And is it over-engineering to cast all these with (char) to prevent the warning?
Edit2: Some extra cases (following on from above cases):
a += A; /* warning converting from int to char */
a++; /* ok */
a += (char)1; /* warning converting from int to char */
Aside from that, what I'm asking is subjective and I'd like to hear how other people deal with the conversion warnings in cases like these when you consider that some developers advocate removing all warnings.
YAE:
One possible solution is to just use ints instead of chars right? Well actually, not only does it require more memory, it is slower too, as can been demonstrated by the following code. The maths expressions are just there to get the warnings when built with -Wconversion. I assumed the version using char variables would run slower than that using ints due to the conversions, but on my (64bit dual core II) system the int version is slower.
#include <stdio.h>
#ifdef USE_INT
typedef int var;
#else
typedef char var;
#endif
int main()
{
var start = 10;
var end = 100;
var n = 5;
int b = 100000000;
while (b > 0) {
n = (start - 5) + (n - (n % 3 ? 1 : 3));
if (n >= end) {
n -= (end + 7);
n += start + 2;
}
b--;
}
return 0;
}
Pass -DUSE_INT to gcc to build the int version of the above snippet.
When you say /* int */ do you mean it's giving you a warning about that? I'm not seeing any warnings at all in this code with gcc 4.0.1 or 4.2.1 with -Wconversion. The compiler is converting these enums into constants. Since everything is known at compile time, there is no reason to generate a warning. The compiler can optimize out all the uncertainty (the following is Intel with 4.2.1):
movb $45, -1(%rbp) # a = 45
movzbl -1(%rbp), %eax
incl %eax
movb %al, -2(%rbp) # b = 45 + 1
movb $44, -3(%rbp) # c = 44 (the math is done at compile time)
movzbl -1(%rbp), %eax
cmpb -2(%rbp), %al
jle L2
movzbl -2(%rbp), %eax
movb %al, -17(%rbp)
jmp L4
L2:
movzbl -3(%rbp), %eax
movb %al, -17(%rbp)
L4:
movzbl -17(%rbp), %eax
movb %al, -4(%rbp) # d = (a > b ? b : c)
This is without turning on optimizations. With optimizations, it will calculate b and d for you at compile time and hardcode their final values (if it actually needs them for anything). The point is that gcc has already worked out that there can't be a problem here because all the possible values fit in a char.
EDIT: Let me amend this somewhat. There is a possible error in the assignment of b, and the compiler will never catch it, even if it's certain. For example, if b=a+250;, then this will be certain to overflow b but gcc will not issue a warning. It's because the assignment to a is legal, a is a char, and it's your problem (not the compiler's) to make sure that math doesn't overflow at runtime.
Maybe the compiler can already see that all the values fit into a char so it doesn't bother warning. I'd expect the enum to be resolved right at the beginning of the compilation.

Resources