Compare the two:
if (strstr(a, "earth")) // A1
return x;
if (strstr(a, "ear")) // A2
return y;
and
if (strstr(a, "earth")) // B1
return x;
else if (strstr(a, "ear")) // B2
return y;
Personally, I feel that else is redundant and prevent CPU from branch prediction.
In the first one, when executing A1, it's possible to pre-decode A2. And in the second one, it will not interpret B2 until B1 is evaluated to false.
I found a lot of (maybe most of?) sources using the latter form.
Though, the latter form looks better to understand, because it's not so obviously that it will call return y only if a =~ /ear(?!th)/ without the else clause.
Your compiler probably knows that both these examples mean exactly the same thing. CPU branch prediction doesn't come into it.
I usually would choose the first option for symmetry.
(The following answers the original version of the question.)
Do you realize that the two code snippets are NOT semantically equivalent???
Consider what happens if a is "earth".
The first snippet calls foo() and then bar().
The second snippet calls foo() and skips the bar() call.
And this explains why the generated machine code is different. It has to be to implement the different semantics of the respective code fragments!
Personally, I feel that else is redundant ...
Unfortunately, your feeling is incorrect.
Lesson - write your code simply and clearly and leave optimization to the compiler ... which is going to do a far more accurate job than you can achieve.
FOLLOWUP
The snippets in the updated version of the question are now semantically identical, and the else is redundant. However:
any half decent optimizing compiler will generate identical code for the two snippets, and
it is a matter of opinion (i.e. subjective) which of the snippets is easier to understand.
Use else if to state your intentions clearly. Code is meant to be read by humans.
Let the compiler optimize this, and don't worry about optimization until your code is 1) working 2) crystal clear 3) profiled (do this in that order). When doing step 3, you'll notice that the bottlenecks are not where you supposed they would be.
Any attempt to control branch prediction or whatever low level stuff is silly: compilers are very good at optimizing and they use sophisticated methods to yield a fast code on your particular machine.
Look at output from LLVM based compilers to see what I mean: sometimes you can't even remotely understand what it does.
usually it's better to use the second way if you want to test exactly the condition for a, for the exact solution, to reduce the options for the var or const "a". if you write two separate if's you can get 2 different solutions.
for example in your situation with the exact conditions you have there let's say a= -2
A: if (a < 0)
return x; // if -2 is less than 0 will return x and it stops.
else if (a < 100)
return y; //
B: if (a < 0)
return x; // -2 is less than 0 so it will return x and passes to the next if statement;
if (a < 100)
return y; // -2 is also less than 100 and it will return y too
Why not simply write
char* str;
strstr(a, "ear")
if (str != NULL)
{
foo();
if(strstr(str, "earth") != NULL)
{
bar();
}
}
Related
I've noticed that gcc12 does not optimize these two functions to the same code (with -O3):
int x = 0;
void f(bool a)
{
if (a) {
++x;
}
}
void f2(bool a)
{
x += a;
}
Basically no transformation is done. That can be seen here: https://godbolt.org/z/1G3n4fxEK
Optimizing f to the code in f2 seems to be trivial and no jump would be needed anymore. However, I'm curious if there's a reason why this is not done by gcc? Is it somehow still slower or something? I would assume it's never slower and sometimes faster, but I might be wrong.
Thanks
Such a substitution would be incorrect in a scenario where one thread calls f(1) while another thread calls f(0). If x is never actually accessed outside the first thread, there would be no race condition in the code as written, but the substitution would create one. If x is initially 1, nothing would prevent the code from being processed as:
thread 1: read x (yields 1)
thread 2: read x (yields 1)
thread 1: write 2
thread 2: write 1
This would cause x to be left holding the value 1 when thread 2 has just written the value 2. Worse than that, if the function was invoked within a context like:
x = 1;
f(1);
if (x != 1)
launch_nuclear_missiles_if_x_is_1_and_otherwise_make_coffee();
a compiler might recognize that x will always equal 2 following the return from f(1), and thus make the function call unconditional.
To be sure, such substitution would rarely cause problems in real-world situations, but the Standard explicitly forbids transformations that could create race conditions where none would exist in the source code as written.
I would have hoped the compiler would have changed f2 to f. Reading and writing memory may require slow transactions to acquire a copy of the memory location, and update other bus controllers about the state of that location (Invalid -> Shared -> Modified).
Jumping around an update based on a register value is quite cheap; especially with the efficacy of branch predictors.
A much simpler reason why that optimization isn't done is because bool is nothing but an alias for int. Specifically, nothing stops you from passing an arbitrary integer to your function:
int v = 5;
f2(*(bool *)&v);
// x = 5 here
I am trying to compare a c function code to the equivalent of the assembly and kind of confused on the conditional jumps
I looked up jl instruction and it says jump if < but the answer to the question was >= Can someone explain why is that?
To my understanding, the condition is inverted, but the logic is the same; the C source defines
if the condition is satisfied, execute the following block
whereas the assembly source defines
if the condition is violated, skip the following block
which means that the flow of execution will be the same in both implementations.
In essence, what this assembly is doing, is executing your condition as you set it, but using negative logic.
Your condition says:
If a is smaller then b, return x. Otherwise, return y.
What the assembly code says (simplified):
Move y into the buffer for returning. Move b into a different buffer.
If a is bigger then b, jump ahead to the return step. Then y is
returned. If a is not bigger then b, continue in the program. The next
step assigns x to the return buffer. The step after that returns as
normal.
The outcome is the same, but the process is slightly different.
the assembly does, line by line (code not included, because you posted it as image):
foo:
return_value (eax) = y; // !!!
temporary_edx = b; // x86 can't compare memory with memory, so "b" goes to register
set_flags_by(a-b); // cmp does subtraction and discards result, except flags
"jump less to return" // so when a < b => return y (see first line)
return_value (eax) = x;
return
so to make that C code do the same thing, you need:
if (a >= b) { return x; } else { return y; }
BTW, see how easy it is to flip:
if (a < b) { return y; } else { return x; }
So there's no point to translate jl into "less" into C, you have to track down each branch, what really happens, and find for each branch of calculation the correct C-side calculation, and then "create" the condition in C to get the same calculation on both sides, so this task is not about "translating" the assembly, but about deciphering the asm logic + rewriting it back in C. Looks like you sort of completely missed the point and expected you can get away with some simple "match pattern" translation, while you have to work it out fully.
Would it be possible to implement an if that checks for -1 and if not negative -1 than assign the value. But without having to call the function twice? or saving the return value to a local variable. I know this is possible in assembly, but is there a c implementation?
int i, x = -10;
if( func1(x) != -1) i = func1(x);
saving the return value to a local variable
In my experience, avoiding local variables is rarely worth the clarity forfeited. Most compilers (most of the time) can often avoid the corresponding load/stores and just use registers for those locals. So don't avoid it, embrace it! The maintainer's sanity that gets preserved just might be your own.
I know this is possible in assembly, but is there a c implementation?
If it turns out your case is one where assembly is actually appropriate, make a declaration in a header file and link against the assembly routine.
Suggestion:
const int x = -10;
const int y = func1(x);
const int i = y != -1
? y
: 0 /* You didn't really want an uninitialized value here, right? */ ;
It depends whether or not func1 generates any side-effects. Consider rand(), or getchar() as examples. Calling these functions twice in a row might result in different return values, because they generate side effects; rand() changes the seed, and getchar() consumes a character from stdin. That is, rand() == rand() will usually1 evaluate to false, and getchar() == getchar() can't be predicted reliably. Supposing func1 were to generate a side-effect, the return value might differ for consecutive calls with the same input, and hence func1(x) == func1(x) might evaluate to false.
If func1 doesn't generate any side-effect, and the output is consistent based solely on the input, then I fail to see why you wouldn't settle with int i = func1(x);, and base logic on whether or not i == -1. Writing the least repetitive code results in greater legibility and maintainability. If you're concerned about the efficiency of this, don't be. Your compiler is most likely smart enough to eliminate dead code, so it'll do a good job at transforming this into something fairly efficient.
1. ... at least in any sane standard library implementation.
int c;
if((c = func1(x)) != -1) i = c;
The best implementation I could think of would be:
int i = 0; // initialize to something
const int x = -10;
const int y = func1(x);
if (y != -1)
{
i = y;
}
The const would let the compiler to any optimizations that it thinks is best (perhaps inline func1). Notice that func is only called once, which is probably best. The const y would also allow y to be kept in a register (which it would need to be anyway in order to perform the if). If you wanted to give more of a suggestion, you could do:
register const int y = func1(x);
However, the compiler is not required to honor your register keyword suggestion, so its probably best to leave it out.
EDIT BASED ON INSPIRATION FROM BRIAN'S ANSWER:
int i = ((func1(x) + 1) ?:0) - 1;
BTW, I probably wouldn't suggest using this, but it does answer the question. This is based on the SO question here. To me, I'm still confused as to the why for the question, it seems like more of a puzzle or job interview question than something that would be encountered in a "real" program? I'd certainly like to hear why this would be needed.
I saw this code today :
if(++counter == 10)
{
//Do Something
foo();
}
I think this is bad style, but, is the execution compiler dependent aswell? say the counter is set to 8 before we get to this line, it's going to increment it, then compare 10 to 8, the value before, or compare 10 to 9, the value of counter after it got incremented?
What do you think SO? Is this common practice? bad style?
There's nothing compiler-dependent in the behavior of this code (besides possible overflow behavior). Whether it is a good style is a matter of personal preference. I generally avoid making modifications in conditionals, but sometimes it can be useful and even elegant.
This code is guaranteed to compare the new value to 10 (i.e. 9 is compared to 10 in your example). Formally, it is incorrect to say that the comparison takes place after counter gets incremented. There's no "before" or "after" here. The new value can get pre-calculated and compared to 10 even before it is physically placed into counter.
In other words, the evaluation of ++counter == 10 can proceed as
counter = counter + 1
result = (counter == 10)
or as
result = ((counter + 1) == 10)
counter = counter + 1
Note that in the first case counter is incremented before the comparison, while in the second case it is incremented after the comparison. Both scenarios are valid and perfectly possible in practice. Both scenarios produce the same result required by the language specification.
Operator precedence will always cause the increment to take place before the comparison. You may use parenthesis if you wish to make this very explicit, but I wouldn't call this bad coding style.
Personally I'd always separate this into two statements.
counter++;
if (counter == 10)
DoSomething();
This way you don't need to think about what order things happen—there is no scope for confusion. It makes no difference to the generated code and when that is so, readability and maintainability concerns are always king.
It is well defined by the language standard, and whether it is a bad style or not is a matter of a personal preference, and of a context as well. I have one function using conditions similar to this, which I think looks and works very nice, and which I think would be less readable when the increment would be taken out of the condition.
const char *GetStat(int statId)
{
int id = 0;
if (statId==id++)
{
return "Buffers";
}
else if (statId==id++)
{
return "VBuffers";
}
#ifndef _XBOX
else if (statId==id++)
{
return "Reset factor";
}
#endif
else if (statId==id++)
{
return "CB Mem";
}
return "";
}
Note: the increments are actually not "performed" at all here, a decent compiler will eliminate the ++ done on id variable into constants.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:
In a loop, for example the following snippet:
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
If it checks both, wouldn't:
int i = 0;
while (i != 10) {
printf("%d\n", i);
i++;
}
be more efficient?
Thanks!
Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.
One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).
Your 2 example loops do not have the same behavior:
int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
printf("%d\n", i);
i++;
}
And,
int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
printf("%d\n", i);
i++;
}
To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.
This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.
Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.
Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?
Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.
The operators <= and < are a single instruction in assembly, there should be no performance difference.
Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:
int i = 10;
while (i != 0)
{
printf("%d\n", i);
i--;
}
Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).
Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)
The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.
And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit.
If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.
It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.
disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).
Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.
typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.
// Case I
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
printf("%d\n", i);
i++;
}
// Case II
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Case I code take more space but fast and Case II code is take less space but slow compare to Case I code.
Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.
And your both code are same.