Can anybody help me regarding quickest method for evaluating three conditions in minimum steps?
I have three conditions and if any of the two comes out to be true,then whole expression becomes true else false.
I have tried two methods:
if ((condition1 && condition2) ||
(condition1 && condition3) ||
(condition2 && condition3))
Another way is to by introducing variable i and
i = 0;
if (condition1) i++;
if (condition2) i++;
if (condition3) i++;
if (i >= 2)
//do something
I want any other effective method better than the above two.
I am working in a memory constrained environment (Atmeta8 with 8 KB of flash memory) and need a solution that works in C.
This can be reduced to:
if((condition1 && (condition2 || condition3)) || (condition2 && condition3))
//do something
Depending on the likelihood of each condition, you may be able to optimize the ordering to get faster short-circuits (although this would probably be premature optimization...)
It is always hard to give a just "better" solution (better in what regard -- lines of code, readability, execution speed, number of bytes of machine code instructions, ...?) but since you are asking about execution speed in this case, we can focus on that.
You can introduce that variable you suggest, and use it to reduce the conditions to a simple less-than condition once the answer is known. Less-than conditions trivially translate to two machine code instructions on most architectures (for example, CMP (compare) followed by JL (jump if less than) or JNL (jump if not less than) on Intel IA-32). With a little luck, the compiler will notice (or you can do it yourself, but I prefer the clarity that comes with having the same pattern everywhere) that trues < 2 will always be true in the first two if() statements, and optimize it out.
int trues = 0;
if (trues < 2 && condition1) trues++;
if (trues < 2 && condition2) trues++;
if (trues < 2 && condition3) trues++;
// ...
if (trues >= 2)
{
// do something
}
This, once an answer is known, reduces the possibly complex evaluation of conditionN to a simple less-than comparison, because of the boolean short-circuiting behavior of most languages.
Another possible variant, if your language allows you to cast a boolean condition to an integer, is to take advantage of that to reduce the number of source code lines. You will still be evaluating each condition, however.
if( (int)(condition1)
+ (int)(condition2)
+ (int)(condition3)
>= 2)
{
// do something
}
This works based on the assumption that casting a boolean FALSE value to an integer results in 0, and casting TRUE results in 1. You can also use the conditional operator for the same effect, although be aware that it may introduce additional branching.
if( ((condition1) ? 1 : 0)
+ ((condition2) ? 1 : 0)
+ ((condition3) ? 1 : 0)
>= 2)
{
// do something
}
Depending on how smart the compiler's optimzer is, it may be able to determine that once any two conditions have evaluated to true the entire condition will always evaluate to true, and optimize based on that.
Note that unless you have actually profiled your code and determined this to be the culprit, this is likely a case of premature optimization. Always strive for code to be readable by human programmers first, and fast to execute by the computer second, unless you can show definitive proof that the particular piece of code you are looking at is an actual performance bottleneck. Learn how that profiler works and put it to good use. Keep in mind that in most cases, programmer time is an awful lot more expensive than CPU time, and clever techniques take longer for the maintenance programmer to parse.
Also, compilers are really clever pieces of software; sometimes they will actually detect the intent of the code written and be able to use specific constructs meant to make those operations faster, but that relies on it being able to determine what you are trying to do. A perfect example of this is swapping two variables using an intermediary variable, which on IA-32 can be done using XCHG eliminating the intermediary variable, but the compiler has to be able to determine that you are actually doing that and not something clever which may give another result in some cases.
Since the vast majority of the non-explicitly-throwaway software written spends the vast majority of its lifetime in maintenance mode (and lots of throwaway software written is alive and well long past its intended best before date), it makes sense to optimize for maintainability unless that comes at an unacceptable cost in other respects. Of course, if you are evaluating those conditions a trillion times inside a tight loop, targetted optimization very well might make sense. But the profiler will tell you exactly which portions of your code need to be scrutinized more closely from a performance point of view, meaning that you avoid complicating the code unnecessarily.
And the above caveats said, I have been working on code recently making changes that at first glance would almost certainly be considered premature detail optimization. If you have a requirement for high performance and use the profiler to determine which parts of the code are the bottlenecks, then the optimizations aren't premature. (They may still be ill-advised, however, depending on the exact circumstances.)
Depends on your language, I might resort to something like:
$cond = array(true, true, false);
if (count(array_filter($cond)) >= 2)
or
if (array_reduce($cond, function ($i, $k) { return $i + (int)$k; }) >= 2)
There is no absolut answer to this. This depends very much on the underlying architecture. E.g. if you program in VHDL or Verilog some hardware circuit, then for sure the first would give you the fastest result. I assume that your target some kind of CPU, but even here very much will depend on the target cpu, the instruction it supports, and which time they will take. Also you dont specify your target language (e.g. your first approach can be short circuited which can heavily impact speed).
If knowing nothing else I would recommend the second solution - just for the reason that your intentions (at least 2 conditions should be true) are better reflected in the code.
The speed difference of the two solutions would be not very high - if this is just some logic and not the part of some innermost loop that is executed many many many times, I would even guess for premature optimization and try to optimize somewhere else.
You may consider simpy adding them. If you use masroses from standart stdbool.h, then true is 1 and (condition1 + condition2 + condition3) >= 2 is what you want.
But it is still a mere microoptimization, usually you wouldn't get a lot of productivity with this kind of tricks.
Since we're not on a deeply pipelined architecture there's probably no value in branch avoidance, which would normally steer the optimisations offered by desktop developers. Short-cuts are golden, here.
If you go for:
if ((condition1 && (condition2 || condition3)) || (condition2 && condition3))
then you probably have the best chance, without depending on any further information, of getting the best machine code out of the compiler. It's possible, in assembly, to do things like have the second evaluation of condition2 branch back to the first evaluation of condition3 to reduce code size, but there's no reliable way to express this in C.
If you know that you will usually fail the test, and you know which two conditions usually cause that, then you might prefer to write:
if ((rare1 || rare2) && (common3 || (rare1 && rare2)))
but there's still a fairly good chance the compiler will completely rearrange that and use its own shortcut arrangement.
You might like to annotate things with __builtin_expect() or _Rarely() or whatever your compiler provides to indicate the likely outcome of a condition.
However, what's far more likely to meaningfully improve performance is recognising any common factors between the conditions or any way in which the conditions can be tested in a way that simplifies the overall test.
For example, if the tests are simple then in assembly you could almost certainly do some basic trickery with carry to accumulate the conditions quickly. Porting that back to C is sometimes viable.
You seem wiling to evaluate all the conditions, as you proposed such a solution yourself in your question. If the conditions are very complex formulas that take many CPU cycles to compute (like on the order of hundreds of milliseconds), then you may consider evaluating all three conditions simultaneously with threads to get a speed-up. Something like:
pthread_create(&t1, detached, eval_condition1, &status);
pthread_create(&t2, detached, eval_condition2, &status);
pthread_create(&t3, detached, eval_condition3, &status);
pthread_mutex_lock(&status.lock);
while (status.trues < 2 && status.falses < 2) {
pthread_cond_wait(&status.cond, &status.lock);
}
pthread_mutex_unlock(&status.lock);
if (status.trues > 1) {
/* do something */
}
Whether this gives you a speed up depends on how expensive it is to compute the conditions. The compute time has to dominate the thread creation and synchronization overheads.
Try this one:
unsigned char i;
i = condition1;
i += condition2;
i += condition3;
if (i & (unsigned char)0x02)
{
/*
At least 2 conditions are True
0b00 - 0 conditions are true
0b01 - 1 conditions are true
0b11 - 3 conditions are true
0b10 - 2 conditions are true
So, Checking 2nd LS bit is good enough.
*/
}
Related
Before I begin, yes, I'm aware of the compiler built-ins __builtin_expect and __builtin_unpredictable (Clang). They do solve the issue to some extent, but my question is about something neither completely solves.
As a very simple example, suppose we have the following code.
void highly_contrived_example(unsigned int * numbers, unsigned int count) {
unsigned int * const end = numbers + count;
for (unsigned int * iterator = numbers; iterator != end; ++ iterator)
foo(* iterator % 2 == 0 ? 420 : 69);
}
Nothing complicated at all. Just calls foo() with 420 whenever the current number is even, and with 69 when it isn't.
Suppose, however, that it is known ahead of time that the data is guaranteed to look a certain way. For example, if it were always random, then a conditional select (csel (ARM), cmov (x86), etc) possibly would be better than a branch.⁰ If it were always in highly predictable patterns (e.g. always a lengthy stream of evens/odds before a lengthy stream of the other, and so on), then a branch would be better.⁰ __builtin_expect would not really solve the issue if the number of evens/odds were about equal, and I'm not sure whether the absence of __builtin_unpredictable would influence branchiness (plus, it's Clang-only).
My current "solution" is to lie to the compiler and use __builtin_expect with a high probability of whichever side, to influence the compiler to generate a branch in the predictable case (for simple cases like this, all it seems to do is change the ordering of the comparison to suit the expected probability), and __builtin_unpredictable to influence it to not generate a branch, if possible, in the unpredictable case.¹ Either that or inline assembly. That's always fun to use.
⁰ Although I have not actually done any benchmarks, I'm aware that even using a branch may not necessarily be faster than a conditional select for the given example. The example is only for illustrative purposes, and may not actually exhibit the problem described.
¹ Modern compilers are smart. More often than not, they can determine reasonably well which approach to actually use. My question is for the niche cases in which they cannot reasonably figure that out, and in which the performance difference actually matters.
Is there a difference in performance between
if(array[i] == -1){
doThis();
}
else {
doThat();
}
and
if(array[i] != -1){
doThat();
}
else {
doThis();
}
when I already now that there is only one element (or in general few elements) with the value -1 ?
That will depend entirely on how your compiler chooses to optimise it. You have no guarantee as to which is faster. If you really need to give hints to the compiler, look at the unlikely macro in the Linux kernel which are defined thus:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
Which means you can use
if (likely(something)) { ... }
or
if (unlikely(something)) { ... }
Details here: http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Moral: write your code for readability, and not how you think the compiler will optimise it, as you are likely to be wrong.
Performance is always implementation dependent. If it is sufficiently important to you, then you need to benchmark it in your environment.
Having said that: there is probably no difference, because modern compilers are likely to turn both versions into equally efficient machine code.
One thing that might cause a difference is if the different code order changes the compiler's branch prediction heuristics. This can occasionally make a noticeable difference.
The compiler wouldn't know about your actual data, so it will produce roughly the same low-level code.
However, given that if-statements generate assembly branches and jumps, your code may run a little faster in your second version because if your value is not -1 then your code will run the very next instruction. Whereas in your first version the code would need to jump to a new instruction address, which may be costly, especially when you deal with a large number of values (say millions).
That would depend on which condition is encountered first. As such there is not such a big diffrence.
-----> If You have to test a lot of statements, rather than using nested if-else a switch statement would be faster.
I was wondering if there is a big performance difference in languages, whether you should put the more likely to be executed code in the if or in the else clause. Here is an example:
// x is a random number, or some key code from the user
if(!somespecific_keycode)
do the general stuff
else
do specific stuff
and the other solution
if(somespecific_keycode)
do the specific stuff
else
do general stuff
Prefer to put them in the order that makes the code clearer, which is usually having the more likely to be executed first.
As others said: in terms of performance you should best rely on your compiler and your hardware (branch prediction, speculative execution) to do the right thing.
In case you are really concerned that these two don't help you enough, GCC provides a builtin (__builtin_expect) with which you can explicitly indicate the expected outcome of a branch.
In terms of code readability, I personally like the more likely case to be on top.
Unless you experience a performance problem, don't worry about it.
If you do experience a performance problem, try switching them around and measure which variant is faster, if any of them.
The common rule is to put more likely case first, it's considered to be more readable.
branch prediction will cause one of those to be more likely and it will cause a performance difference if inside a loop. But mostly you can ignore that if you are not thinking at assembler level.
This isn't necessarily a performance concern, but I usually go from specific to general to prevent cases like this:
int i = 15;
if(i % 3 == 0)
System.out.println("fizz");
else if(i % 5 == 0)
System.out.println("buzz");
else if(i % 3 == 0 && i % 5 == 0)
System.out.println("fizzbuzz");
Here the above code will never say 'fizzbuzz', because 15 matches both the i % 3 == 0 and i % 5 == 0 conditions. If you re-order into something more specific:
int i = 15;
if(i % 3 == 0 && i % 5 == 0)
System.out.println("fizzbuzz");
else if(i % 3 == 0)
System.out.println("fizz");
else if(i % 5 == 0)
System.out.println("buzz");
Now the above code will reach "fizzbuzz" before getting stopped by the more general conditions
All answers have valid points. Here is an additional one:
Avoid double negations: if not this, then that, else something tends to be confusing for the reader. Hence for the example given, I would favor:
if (somespecific_keycode) {
do_the_specific_stuff();
} else {
do_general_stuff();
}
It mostly doesn't make a difference but sometimes it is easier to read and debug if your ifs are checking if something is true or equal and the else handles when that isn't the case.
As the others have said, it's not going to make a huge difference unless you are using this many many times (in a loop for example). In that case, put the most likely condition first as it will have the earliest opportunity to break out of the condition checking.
It become more apparent when you start having many 'else if's .
Any difference that may arise is more related to the context than inherently with if-else constructions. So the best you can do here is develop your own tests to detect any difference.
Unless you are optimizing an already finished system or software, what I'd recommend you is avoid premature optimizations. Probably you've already heard they are evil.
AFAIK with modern optimizing C compilers there is no direct relation between how you organize your if or loop and actual branching instructions in generated code. Moreover different CPUs have different branch prediction algorithms.
Therefore:
Don't optimize until you see bad performance related to this code
If you do optimize, measure and compare different versions
Use realistic data of varied characteristics for performance measurement
Look at assembly code generated by your compiler in both cases.
Given the code :
for (int i = 0; i < n; ++i)
{
A(i) ;
B(i) ;
C(i) ;
}
And the optimization version :
for (int i = 0; i < (n - 2); i+=3)
{
A(i)
A(i+1)
A(i+2)
B(i)
B(i+1)
B(i+2)
C(i)
C(i+1)
C(i+2)
}
Something is not clear to me : which is better ? I can't see anything that works any faster using the other version . Am I missing something here ?
All I see is that each instruction is depending on the previous instruction , meaning that
I need to wait that the previous instruction would finish in order to start the one after ...
Thanks
In the high-level view of a language, you're not going to see the optimization. The speed enhancement comes from what the compiler does with what you have.
In the first case, it's something like:
LOCATION_FLAG;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false
In the second it's something like:
LOCATION_FLAG;
DO_SOMETHING;
DO_SOMETHING;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false
You can see in the latter case, the overhead of testing and jumping is only 1 instruction per 3. In the first it's 1 instruction per 1; so it happens a lot more often.
Therefore, if you have invariants you can rely on (an array of mod 3, to use your example) then it is more efficient to unwind loops because the underlying assembly is written more directly.
Loop unrolling is used to reduce the number of jump & branch instructions which could potentially make the loop faster but will increase the size of the binary. Depending on the implementation and platform, either could be faster.
Well, whether this code is "better" or "worse" totally depends on implementations of A, B and C, which values of n you expect, which compiler you are using and which hardware you are running on.
Typically the benefit of loop unrolling is that the overhead of doing the loop (that is, increasing i and comparing it with n) is reduced. In this case, could be reduced by a factor of 3.
As long as the functions A(), B() and C() don't modify the same datasets, the second verion provides more parallelization options.
In the first version, the three functions could run simultaneously, assuming no interdependencies. In the second version, all three functions could be run with all three datasets at the same time, assuming you had enough execution units to do so and again, no interdependencies.
Generally its not a good idea to try to "invent" optimizations, unless you have hard evidence that you will gain an increase, because many times you may end up introducing a degradation. Typically the best way to obtain such evidence is with a good profiler. I would test both versions of this code with a profiler to see the difference.
Also, many times loop unrolling isnt very protable, as mentioned previously, it depends greatly on the platform, compiler, etc.
You can additionally play with the compiler options. An interesting gcc option is "-floop-optimize", that you get automatically with "-O, -O2, -O3, and -Os"
EDIT Additionally, look at the "-funroll-loops" compiler option.
Let's take a simple example of two lines supposedly doing the same thing:
if (value >= 128 || value < 0)
...
or
if (value & ~ 127)
...
Say 'If's are costly in a loop of thousands of iterations, is it better to keep with the traditional C syntax or better to find a binary optimized one if possible?
I would use first statement with traditional syntax as it is more readable.
It's possible to break the eyes with the second statement.
Care about programmers who will use the code after you.
In 99 cases out of 100, do the one which is more readable and expresses your intent better.
In theory, compilers will do this sort of optimization for you. In practice, they might not. This particular example is a little bit subtle, because the two are not equivalent, unless you make some assumptions about value and on whether or not signed arithmetic is 2's complement on your target platform.
Use whichever you find more readable. If and when you have evidence that the performance of this particular test is critical, use whatever gives you the best performance. Personally, I would probably write:
if ((unsigned int)value >= 96U)
because that's more intuitive to me, and more likely to get handled well by most compilers I've worked with.
It depends on how/where check is. If the check is being done once during program start up to check the command line parameter, then the performance issue is completely moot and you should use whatever is more natural.
On the other hand, if the check was inside some inner loop that is happening millions of times a second, then it may matter. But don't assume one will be better; you should create both versions and time them to see if there is any measurable difference between the two.