I was trying to see the speed difference between == and !=, and it occurred to me that there might be a possibility that the order in an if-else doesn't matter. Purely logically, if you need to test a condition, and have only 2 options, there should not be any difference if you jump to the "if" part or the "else" part.
At least this was my thought process, knowing nothing about how it actually works. This is where you come in.
Here is some code to show what I am trying to choose between:
if (x == 10)
// do stuff. this will be true 20% of the time
else
// do frequent stuff
if (x != 10)
// do frequent stuff 80% of time
else
// do other stuff 20% of the time
Please help
Modern CPUs are pipelined, so they start executing the next instruction before they are finished with the current. This is great for performance, but there is a problem when the CPU has a branch, like an if statement. The CPU then has to guess which way to go, if it gets it right everything continues at maximum speed, but if it guesses wrong it has to go back and follow the right path, which is quite costly.
You don't have to worry about what exactly affects the CPUs decision making for the guessing as gcc and clang have __builtin_expect which can be used to tell the compiler which branch happens more often.
In your case you could write your code as
if (__builtin_expect(x == 10, 0))
// do stuff (not expected to happen)
else
// do frequent stuff (expected)
or
if (__builtin_expect(x != 10, 1))
// do frequent stuff (expected)
else
// do stuff (not expected)
You can look at this stackoverflow post for more:
What is the advantage of GCC's __builtin_expect in if else statements?
Related
With regards to coding in C, which would be faster, to check the statement with an If, or I just run the function anyway for example say the output is already 1.
if(a==b && output!=1)
{
output=1;
}
Or
if(a==b)
{
output=1;
}
In the first code, an extra check has to be run every time the code runs.
In the second you are running the code repeatedly unnecessarily
Which is more efficient??
The question basically boils down to the question of is a compare less expensive than a variable assignment. For integers, the answer is no. I am assuming this will be in a tight loop where the variables will already be in the CPU level 1 cache. The compare will compile down to Op codes like:
1) Move "output" memory locations data into Register A
2) Put 1 into Register B
3) Jump <somewhere> if Register A == Register B.
You might get an optimization where 2) is not done if comparing to 0 because there are special op codes for comparing to 0 in most CPUs.
The assignment will compiler to op codes like:
1) Put 1 into Register A
2) Push Register A to memory location of output
The question come down to clock cycles spent for each of the op codes. I think that they are all likely to be exactly the same clock cycles.
Regardless any possible optimization, as shown in the comments, the first code is less efficient than the second code due to the extra check.
Beware of your data meaning, that check may be mandatory.
If not, you should optimize your code as suggested.
Edit
I'm assuming your question to be more theoretical than practical. In any real scenario, the data context assume a huge role when we want to optimize some code.
The code don't need to be fast itself, but need to be fast in processing its data.
Hy there everyone!
I'm doing good progress on my AVR for my DIY sprinkler and fish tank automatization, but I've come across a question, that bugs me.
Which if statement runs on the AVR faster?(in less clock cycles)
By how much?
if(temp_sensor[0] < -20)
{
OCR1A--;
}
else if(tempout > tempset)
{
OCR1A--;
}
Or
if((temp_sensor[0] < -20) || (tempout > tempset))
{
OCR1A--;
}
On second thought, my second question is:
Which one uses less space?
My conclusion:
First of all, thanks everyone for your answers and comments!
The primary objective should be to write a clean code, that is easy to understand.
You could try for a (seemingly) jumpless approach:
const int8_t delta = temp_sensor < -20 || tempout > tempset;
OCR1A -= delta;
That can sometimes give shorter code. Of course it's very CPU-dependent, not sure how well the AVR likes code like this. It very well might generate a jump for the short-circuiting of the || operator, too. It's also totally possible for the compiler to optimize itself out of the jumps all on its own.
Write code for readability, not for speed. Especially in those very trivial cases where the compiler can easily figure out what is happening and optimize it.
You should avoid the 1st way because you have some duplicated code in it, which isn't ideal.
Also I must point out that unless optimized, an || or an &&, are compiled to branch instructions the same way an if statement is, so they do improve readability in the code but don't really bring any performance advantage.
this way
if((temp_sensor[0] < -20) || (tempout > tempset))
{
OCR1A--;
}
is better way to write these kind of conditions. it is more understandable. as to if it takes less ticks it's not really significant, unless you do it hundred of thousand of times, and in this case just check them both and see which is better
Adding to unwind, you can also use it this way:
OCR1A -= (uint8_t)((temp_sensor < -20) | (tempout > tempset));
| is bitwise OR.
This will remove the JUMP code required for Logical OR (||).
Can anybody help me regarding quickest method for evaluating three conditions in minimum steps?
I have three conditions and if any of the two comes out to be true,then whole expression becomes true else false.
I have tried two methods:
if ((condition1 && condition2) ||
(condition1 && condition3) ||
(condition2 && condition3))
Another way is to by introducing variable i and
i = 0;
if (condition1) i++;
if (condition2) i++;
if (condition3) i++;
if (i >= 2)
//do something
I want any other effective method better than the above two.
I am working in a memory constrained environment (Atmeta8 with 8 KB of flash memory) and need a solution that works in C.
This can be reduced to:
if((condition1 && (condition2 || condition3)) || (condition2 && condition3))
//do something
Depending on the likelihood of each condition, you may be able to optimize the ordering to get faster short-circuits (although this would probably be premature optimization...)
It is always hard to give a just "better" solution (better in what regard -- lines of code, readability, execution speed, number of bytes of machine code instructions, ...?) but since you are asking about execution speed in this case, we can focus on that.
You can introduce that variable you suggest, and use it to reduce the conditions to a simple less-than condition once the answer is known. Less-than conditions trivially translate to two machine code instructions on most architectures (for example, CMP (compare) followed by JL (jump if less than) or JNL (jump if not less than) on Intel IA-32). With a little luck, the compiler will notice (or you can do it yourself, but I prefer the clarity that comes with having the same pattern everywhere) that trues < 2 will always be true in the first two if() statements, and optimize it out.
int trues = 0;
if (trues < 2 && condition1) trues++;
if (trues < 2 && condition2) trues++;
if (trues < 2 && condition3) trues++;
// ...
if (trues >= 2)
{
// do something
}
This, once an answer is known, reduces the possibly complex evaluation of conditionN to a simple less-than comparison, because of the boolean short-circuiting behavior of most languages.
Another possible variant, if your language allows you to cast a boolean condition to an integer, is to take advantage of that to reduce the number of source code lines. You will still be evaluating each condition, however.
if( (int)(condition1)
+ (int)(condition2)
+ (int)(condition3)
>= 2)
{
// do something
}
This works based on the assumption that casting a boolean FALSE value to an integer results in 0, and casting TRUE results in 1. You can also use the conditional operator for the same effect, although be aware that it may introduce additional branching.
if( ((condition1) ? 1 : 0)
+ ((condition2) ? 1 : 0)
+ ((condition3) ? 1 : 0)
>= 2)
{
// do something
}
Depending on how smart the compiler's optimzer is, it may be able to determine that once any two conditions have evaluated to true the entire condition will always evaluate to true, and optimize based on that.
Note that unless you have actually profiled your code and determined this to be the culprit, this is likely a case of premature optimization. Always strive for code to be readable by human programmers first, and fast to execute by the computer second, unless you can show definitive proof that the particular piece of code you are looking at is an actual performance bottleneck. Learn how that profiler works and put it to good use. Keep in mind that in most cases, programmer time is an awful lot more expensive than CPU time, and clever techniques take longer for the maintenance programmer to parse.
Also, compilers are really clever pieces of software; sometimes they will actually detect the intent of the code written and be able to use specific constructs meant to make those operations faster, but that relies on it being able to determine what you are trying to do. A perfect example of this is swapping two variables using an intermediary variable, which on IA-32 can be done using XCHG eliminating the intermediary variable, but the compiler has to be able to determine that you are actually doing that and not something clever which may give another result in some cases.
Since the vast majority of the non-explicitly-throwaway software written spends the vast majority of its lifetime in maintenance mode (and lots of throwaway software written is alive and well long past its intended best before date), it makes sense to optimize for maintainability unless that comes at an unacceptable cost in other respects. Of course, if you are evaluating those conditions a trillion times inside a tight loop, targetted optimization very well might make sense. But the profiler will tell you exactly which portions of your code need to be scrutinized more closely from a performance point of view, meaning that you avoid complicating the code unnecessarily.
And the above caveats said, I have been working on code recently making changes that at first glance would almost certainly be considered premature detail optimization. If you have a requirement for high performance and use the profiler to determine which parts of the code are the bottlenecks, then the optimizations aren't premature. (They may still be ill-advised, however, depending on the exact circumstances.)
Depends on your language, I might resort to something like:
$cond = array(true, true, false);
if (count(array_filter($cond)) >= 2)
or
if (array_reduce($cond, function ($i, $k) { return $i + (int)$k; }) >= 2)
There is no absolut answer to this. This depends very much on the underlying architecture. E.g. if you program in VHDL or Verilog some hardware circuit, then for sure the first would give you the fastest result. I assume that your target some kind of CPU, but even here very much will depend on the target cpu, the instruction it supports, and which time they will take. Also you dont specify your target language (e.g. your first approach can be short circuited which can heavily impact speed).
If knowing nothing else I would recommend the second solution - just for the reason that your intentions (at least 2 conditions should be true) are better reflected in the code.
The speed difference of the two solutions would be not very high - if this is just some logic and not the part of some innermost loop that is executed many many many times, I would even guess for premature optimization and try to optimize somewhere else.
You may consider simpy adding them. If you use masroses from standart stdbool.h, then true is 1 and (condition1 + condition2 + condition3) >= 2 is what you want.
But it is still a mere microoptimization, usually you wouldn't get a lot of productivity with this kind of tricks.
Since we're not on a deeply pipelined architecture there's probably no value in branch avoidance, which would normally steer the optimisations offered by desktop developers. Short-cuts are golden, here.
If you go for:
if ((condition1 && (condition2 || condition3)) || (condition2 && condition3))
then you probably have the best chance, without depending on any further information, of getting the best machine code out of the compiler. It's possible, in assembly, to do things like have the second evaluation of condition2 branch back to the first evaluation of condition3 to reduce code size, but there's no reliable way to express this in C.
If you know that you will usually fail the test, and you know which two conditions usually cause that, then you might prefer to write:
if ((rare1 || rare2) && (common3 || (rare1 && rare2)))
but there's still a fairly good chance the compiler will completely rearrange that and use its own shortcut arrangement.
You might like to annotate things with __builtin_expect() or _Rarely() or whatever your compiler provides to indicate the likely outcome of a condition.
However, what's far more likely to meaningfully improve performance is recognising any common factors between the conditions or any way in which the conditions can be tested in a way that simplifies the overall test.
For example, if the tests are simple then in assembly you could almost certainly do some basic trickery with carry to accumulate the conditions quickly. Porting that back to C is sometimes viable.
You seem wiling to evaluate all the conditions, as you proposed such a solution yourself in your question. If the conditions are very complex formulas that take many CPU cycles to compute (like on the order of hundreds of milliseconds), then you may consider evaluating all three conditions simultaneously with threads to get a speed-up. Something like:
pthread_create(&t1, detached, eval_condition1, &status);
pthread_create(&t2, detached, eval_condition2, &status);
pthread_create(&t3, detached, eval_condition3, &status);
pthread_mutex_lock(&status.lock);
while (status.trues < 2 && status.falses < 2) {
pthread_cond_wait(&status.cond, &status.lock);
}
pthread_mutex_unlock(&status.lock);
if (status.trues > 1) {
/* do something */
}
Whether this gives you a speed up depends on how expensive it is to compute the conditions. The compute time has to dominate the thread creation and synchronization overheads.
Try this one:
unsigned char i;
i = condition1;
i += condition2;
i += condition3;
if (i & (unsigned char)0x02)
{
/*
At least 2 conditions are True
0b00 - 0 conditions are true
0b01 - 1 conditions are true
0b11 - 3 conditions are true
0b10 - 2 conditions are true
So, Checking 2nd LS bit is good enough.
*/
}
I was wondering if there is a big performance difference in languages, whether you should put the more likely to be executed code in the if or in the else clause. Here is an example:
// x is a random number, or some key code from the user
if(!somespecific_keycode)
do the general stuff
else
do specific stuff
and the other solution
if(somespecific_keycode)
do the specific stuff
else
do general stuff
Prefer to put them in the order that makes the code clearer, which is usually having the more likely to be executed first.
As others said: in terms of performance you should best rely on your compiler and your hardware (branch prediction, speculative execution) to do the right thing.
In case you are really concerned that these two don't help you enough, GCC provides a builtin (__builtin_expect) with which you can explicitly indicate the expected outcome of a branch.
In terms of code readability, I personally like the more likely case to be on top.
Unless you experience a performance problem, don't worry about it.
If you do experience a performance problem, try switching them around and measure which variant is faster, if any of them.
The common rule is to put more likely case first, it's considered to be more readable.
branch prediction will cause one of those to be more likely and it will cause a performance difference if inside a loop. But mostly you can ignore that if you are not thinking at assembler level.
This isn't necessarily a performance concern, but I usually go from specific to general to prevent cases like this:
int i = 15;
if(i % 3 == 0)
System.out.println("fizz");
else if(i % 5 == 0)
System.out.println("buzz");
else if(i % 3 == 0 && i % 5 == 0)
System.out.println("fizzbuzz");
Here the above code will never say 'fizzbuzz', because 15 matches both the i % 3 == 0 and i % 5 == 0 conditions. If you re-order into something more specific:
int i = 15;
if(i % 3 == 0 && i % 5 == 0)
System.out.println("fizzbuzz");
else if(i % 3 == 0)
System.out.println("fizz");
else if(i % 5 == 0)
System.out.println("buzz");
Now the above code will reach "fizzbuzz" before getting stopped by the more general conditions
All answers have valid points. Here is an additional one:
Avoid double negations: if not this, then that, else something tends to be confusing for the reader. Hence for the example given, I would favor:
if (somespecific_keycode) {
do_the_specific_stuff();
} else {
do_general_stuff();
}
It mostly doesn't make a difference but sometimes it is easier to read and debug if your ifs are checking if something is true or equal and the else handles when that isn't the case.
As the others have said, it's not going to make a huge difference unless you are using this many many times (in a loop for example). In that case, put the most likely condition first as it will have the earliest opportunity to break out of the condition checking.
It become more apparent when you start having many 'else if's .
Any difference that may arise is more related to the context than inherently with if-else constructions. So the best you can do here is develop your own tests to detect any difference.
Unless you are optimizing an already finished system or software, what I'd recommend you is avoid premature optimizations. Probably you've already heard they are evil.
AFAIK with modern optimizing C compilers there is no direct relation between how you organize your if or loop and actual branching instructions in generated code. Moreover different CPUs have different branch prediction algorithms.
Therefore:
Don't optimize until you see bad performance related to this code
If you do optimize, measure and compare different versions
Use realistic data of varied characteristics for performance measurement
Look at assembly code generated by your compiler in both cases.
given the following statement which is executed a lot:
iNormVal = iVal / uRatio;
would the following make more sense (performance wise) if uRatio == 1 most (90%) of the time?
if(uRatio > 1)
iNormVal = iVal / uRatio;
else
iNormVal = iVal;
thanks..
Since you spotted this as a potential bottleneck, it's very likely this spot is totally irrelevant for your app's overall speed. Seriously, humans, even guru programmers, are notoriously bad at spotting real bottlenecks. (The difference is that good programmers admit and preach that, while juniors keep spending time for optimizing irrelevant spots.)
Generally I found this approach to optimizations most helpful:
If speed is a major concern, schedule considerable time for optimizations to be done before releasing your app.
Design your code so that it doesn't sport inherent pessimizations.
Implement it the way it's easiest to understand the code. Prevent obvious pessimizations (like passing parameters by value instead of reference), but don't get overexcited.
Check if it is too slow. If so profile the app and identify the hot spots.
Put resources into optimizing (and thereby potentially obfuscating) those hot spots only, iteratively profiling to check which changes help.
Stop when the app is fast enough.
(It's different for library code, obviously, but these few steps would carry you a long way.)
You need to profile this to get a measurement, it's too hard to guess. The compiler might decide you're wrong and remove the test, so check with and without optimizing.
The actual cost of an (integer) division might be rather low, especially on modern desktop-class processors. According to this PDF, the costs on modern (Wolfdale/Nehalem/Sandy Bridge) of a 32/32-bit division are 14-23/17-28/20-28 cycles respectively. So, if you really do this a lot, it might add up. In that case, look into parallel (vectorized) options if possible.
I would try to avoid it if at all possible, since it introduces a branch. Branches have two disadvantages: they make the code more complex by introducing multiple paths that the programmer reading the code has to understand, and they can also introduce execution overhead.
It depends.
Is the code in a performance critical application? If so then it may help perf wise. If not well then I would usually err on the side or readibility and not introduce the extra if statement.
Even if it is in a performance critical application, it is usually the external boundary interactions that account for 95% of the perf time such as interactions with databases or external services. Compilers usually execute very quickly and if statements are very cheap. When we usually profile our code, it is rare that we would make a change such as what you have described for perf reasons only. Misuse of looping and the like may sometimes prop up but we rarely add if statements like described.
Hope this helps...
If you decide to go with branching then you could check first for the common case. It is slightly more readable and should be slightly better performance wise.
if(uRatio <= 1) {
iNormVal = iVal;
}
else {
iNormVal = iVal / uRatio;
}
To be more readable you could add a local variable with a good name that holds the result of the expression.
unsigned int uSmallRatio = uRatio <= 1;
if(uSmallRatio) {
iNormVal = iVal;
}
else {
iNormVal = iVal / uRatio;
}
The compiler could optimize this into the same machine code as the first approach. I'm not sure about this though.
Similarly you could do this but it is not pretty:
iNormVal = uRatio <= 1 ? iVal : iVal / uRatio;
Finally another approach would be:
iNormVal = iVal;
if(uRatio > 1) { /*explain why you do this so it won't be changed by somebody else*/
iNormVal = iVal / uRatio;
}
I'm sure there are other approaches to consider.
Regards...
The if clause will actually most likely make the program slower. Branching is really bad for performance because modern processors are pipelined, and branches prevent the pipeline from being fully effective. This is such a significant issue that considerable effort goes into branch prediction, but that's not going to help in this case. Even if the prediction is right 90% of the time, that means an empty pipeline 10% of the time, which is a lot worse than an int division (expecially when taking into account that the if clause itself takes time).
But most likely it does not matter at all because your code spends most of its time in a completely different place, making this whole question a huge waste of time.
Most performance issues are either ideological (you designed way wrong), or implementation of a proven slower algorithm (given choices).
Beyond that, performance gains are going to be at the assembly level, and will be platform dependent.
I can hardly recommend this as an actual concern for performance, unless you're really strapped for performance, at which point you need to go check the above first.
All you've done is raise the eyebrows of the person that will maintain your code. Hope you have a lazy programmer that leaves stuff alone, or you'll end up losing this code anyway.
Because you have no guarantees at this level, of how code will perform on different platforms given different compiler, compiler options, and optimizations, you may even lose the code to compiler optimization. It's best to focus on larger issues.