Is there any performance difference between the following two cases? - c

is there any performance difference between the following two cases:
First:
int test_some_condition(void);
if( some_variable == 2 && test_some_condition())
{
//body
}
Second:
int test_some_condition(void);
if( some_variable == 2 )
{
if(test_some_condition())
{
//body
}
}
UPDATE: I know how to create a test and measure the performance of each case or to look at the assembly produced for each case but I am sure I am not the first one to come across this question and it would be great if someone who has already tested that can be me a simple yes/no answer.

Difference in terms of what?
Readability? Yes, there's a difference. The first is much clearer and much better expresses your intent. And it also takes up fewer lines in the editor, which is not necessarily an advantage in itself, but it does make the code easier to read and grok at a glance for anyone who comes behind and wants to edit it.
Performance/"speed"? No. I'd be willing to bet actual money that there is absolutely no discernible difference once you run those two snippets of code through a compiler with optimizations turned on. And it wouldn't take much to convince me to bet on that same case even with optimizations disabled.
Why? Because in C (and all of the C-derived languages that I know of), the && operator performs short-circuit evaluation, which means that if the first condition evaluates to false, then it doesn't even bother to evaluate the second condition, because there's no way that the entire statement could ever turn out to be true.
Nesting the if statements was a common "optimization" trick in the bad old days of VB 6 when the And operator did not perform short-circuit evaluation. There's no purpose that I can imagine to using it in C code, unless it enhances readability. And honestly, if you run across a compiler that doesn't render these two code snippets completely equivalent in terms of performance, then it's time to throw that compiler away and stop using it. This is the most basic optimization under the sun, a "low-hanging fruit" for compiler writers. If they can't get this right, I wouldn't trust them with the rest of your code.
But, in general, worrying about this sort of thing (which definitely falls under the category of a "micro-optimization") is not helping you to write better code or become a better programmer. It's just causing you to waste a lot of time asking questions on Stack Overflow and contributing to the reputation of users like me who post this same answer to 2–3 similar questions a week. And that's time you're not spending writing code and improving your skills in tangible ways.

The only way anyone can really tell with a modern compiler is to look at the machine code.

There should be no difference. If there is a difference, it's likely not measurable (even if you do it millions of times you won't get conclusive results).

Testing these two examples in a loop running 10.000.000 times gives:
$ time ./test1
real 0m0.045s
user 0m0.044s
sys 0m0.001s
$ time ./test2
real 0m0.045s
user 0m0.043s
sys 0m0.003s
Also, keep in mind that if the first part of the expression fails, the second expression will never be evaluated. This is maybe not so clear in the first example.
Also, in conditions where the first evaluated expression returns false:
$ time ./test1_1
real 0m0.035s
user 0m0.034s
sys 0m0.001s
$ time ./test2_1
real 0m0.035s
user 0m0.034s
sys 0m0.000s

If you are thinking whether the first version always calls test_some_condition but the second version calls it only if the first condition is true, then the answer is that both versions are equivalent because the AND operator is lazy and will not evaluate its second argument if the first is already false.
The standard guarantees this behaviour. This makes it legal to say:
if (array_size > n && my_array[n] == 1) { ... }
This would be broken code without the laziness guarantee.

The two samples are logically equivalent, provided that compound conditions are lazily evaluated, meaning that in a && b, b won't be evaluated when a is known to be false. So even if there were a difference I wouldn't care because that might be an artefact (or a bug) of the compiler you're using, and it might change with the next release or bugfix.

Related

Does gcc always do this kind of optimization? (common subexpression elimination)

As an example, assume that the expression sys->pot.atoms[item->P.kind].mass is evaluated inside a loop. The loop only changes item, so the expression can be simplified as atoms[item->P.kind].mass by defining a variable as atoms = sys->pot.atoms before the loop. Do modern compilers like gcc perform this kind of optimization automatically (if optimization is enabled)? And is it reliable regardless of the number of expressions like atoms[item->P.kind].mass existing inside a loop?
Yes it is a very common optimisation called Loop invariant code motion, also called hoisting or scalar promotion, often performed as a side effect of Common subexpression elimination.
It is valid to compute sys->pot.atoms just once before the loop if the compiler can ascertain that neither sys nor sys->pot.atoms can be modified inside the loop.
Note however, as commented by Groo, that if sys or sys->pot or sys->pot.atoms are specified as volatile, then it would be incorrect to compute it only once if the expression sys->pot.atoms is evaluated multiple times in the loop body or expressions.
It's a very common optimization.
And is it reliable regardless of the number of expressions
No, because optimizations is not something you can rely on happening in general. The C standard says nothing about it, so it's up to the maker of the compiler to give guarantees or not. But that's nothing you really do for the optimizer. The optimizer has a "best effort" approach, and a missed optimization is often treated like a flaw rather than an actual bug.
EDIT:
From discussion in comments, I found it useful to mention that just because a certain optimization was performed, that does not guarantee faster code. For instance, the benefit of loop unrolling is that the test in the loop does not need to be performed every iteration. But on the other hand, longer code can be less cache friendly. So asking if it's guaranteed that a certain optimization is performed or not does not really give any useful information.
I always wonder where I should do optimization myself, and where I should sit relax and leave it to the compiler.
That's very hard to know in advance. Guys like Linus Torvalds can basically see the assembly code in their head just by watching the C code, but for us mere mortals, it comes down to benchmarking and profiling.
Before even considering micro optimizations, perform these checks
Make sure that the code you're about to optimize actually is a bottleneck
Make sure you're using a good algorithm
Make sure the code is cache friendly

what is the fastest algorithm for permutations of three condition?

Can anybody help me regarding quickest method for evaluating three conditions in minimum steps?
I have three conditions and if any of the two comes out to be true,then whole expression becomes true else false.
I have tried two methods:
if ((condition1 && condition2) ||
(condition1 && condition3) ||
(condition2 && condition3))
Another way is to by introducing variable i and
i = 0;
if (condition1) i++;
if (condition2) i++;
if (condition3) i++;
if (i >= 2)
//do something
I want any other effective method better than the above two.
I am working in a memory constrained environment (Atmeta8 with 8 KB of flash memory) and need a solution that works in C.
This can be reduced to:
if((condition1 && (condition2 || condition3)) || (condition2 && condition3))
//do something
Depending on the likelihood of each condition, you may be able to optimize the ordering to get faster short-circuits (although this would probably be premature optimization...)
It is always hard to give a just "better" solution (better in what regard -- lines of code, readability, execution speed, number of bytes of machine code instructions, ...?) but since you are asking about execution speed in this case, we can focus on that.
You can introduce that variable you suggest, and use it to reduce the conditions to a simple less-than condition once the answer is known. Less-than conditions trivially translate to two machine code instructions on most architectures (for example, CMP (compare) followed by JL (jump if less than) or JNL (jump if not less than) on Intel IA-32). With a little luck, the compiler will notice (or you can do it yourself, but I prefer the clarity that comes with having the same pattern everywhere) that trues < 2 will always be true in the first two if() statements, and optimize it out.
int trues = 0;
if (trues < 2 && condition1) trues++;
if (trues < 2 && condition2) trues++;
if (trues < 2 && condition3) trues++;
// ...
if (trues >= 2)
{
// do something
}
This, once an answer is known, reduces the possibly complex evaluation of conditionN to a simple less-than comparison, because of the boolean short-circuiting behavior of most languages.
Another possible variant, if your language allows you to cast a boolean condition to an integer, is to take advantage of that to reduce the number of source code lines. You will still be evaluating each condition, however.
if( (int)(condition1)
+ (int)(condition2)
+ (int)(condition3)
>= 2)
{
// do something
}
This works based on the assumption that casting a boolean FALSE value to an integer results in 0, and casting TRUE results in 1. You can also use the conditional operator for the same effect, although be aware that it may introduce additional branching.
if( ((condition1) ? 1 : 0)
+ ((condition2) ? 1 : 0)
+ ((condition3) ? 1 : 0)
>= 2)
{
// do something
}
Depending on how smart the compiler's optimzer is, it may be able to determine that once any two conditions have evaluated to true the entire condition will always evaluate to true, and optimize based on that.
Note that unless you have actually profiled your code and determined this to be the culprit, this is likely a case of premature optimization. Always strive for code to be readable by human programmers first, and fast to execute by the computer second, unless you can show definitive proof that the particular piece of code you are looking at is an actual performance bottleneck. Learn how that profiler works and put it to good use. Keep in mind that in most cases, programmer time is an awful lot more expensive than CPU time, and clever techniques take longer for the maintenance programmer to parse.
Also, compilers are really clever pieces of software; sometimes they will actually detect the intent of the code written and be able to use specific constructs meant to make those operations faster, but that relies on it being able to determine what you are trying to do. A perfect example of this is swapping two variables using an intermediary variable, which on IA-32 can be done using XCHG eliminating the intermediary variable, but the compiler has to be able to determine that you are actually doing that and not something clever which may give another result in some cases.
Since the vast majority of the non-explicitly-throwaway software written spends the vast majority of its lifetime in maintenance mode (and lots of throwaway software written is alive and well long past its intended best before date), it makes sense to optimize for maintainability unless that comes at an unacceptable cost in other respects. Of course, if you are evaluating those conditions a trillion times inside a tight loop, targetted optimization very well might make sense. But the profiler will tell you exactly which portions of your code need to be scrutinized more closely from a performance point of view, meaning that you avoid complicating the code unnecessarily.
And the above caveats said, I have been working on code recently making changes that at first glance would almost certainly be considered premature detail optimization. If you have a requirement for high performance and use the profiler to determine which parts of the code are the bottlenecks, then the optimizations aren't premature. (They may still be ill-advised, however, depending on the exact circumstances.)
Depends on your language, I might resort to something like:
$cond = array(true, true, false);
if (count(array_filter($cond)) >= 2)
or
if (array_reduce($cond, function ($i, $k) { return $i + (int)$k; }) >= 2)
There is no absolut answer to this. This depends very much on the underlying architecture. E.g. if you program in VHDL or Verilog some hardware circuit, then for sure the first would give you the fastest result. I assume that your target some kind of CPU, but even here very much will depend on the target cpu, the instruction it supports, and which time they will take. Also you dont specify your target language (e.g. your first approach can be short circuited which can heavily impact speed).
If knowing nothing else I would recommend the second solution - just for the reason that your intentions (at least 2 conditions should be true) are better reflected in the code.
The speed difference of the two solutions would be not very high - if this is just some logic and not the part of some innermost loop that is executed many many many times, I would even guess for premature optimization and try to optimize somewhere else.
You may consider simpy adding them. If you use masroses from standart stdbool.h, then true is 1 and (condition1 + condition2 + condition3) >= 2 is what you want.
But it is still a mere microoptimization, usually you wouldn't get a lot of productivity with this kind of tricks.
Since we're not on a deeply pipelined architecture there's probably no value in branch avoidance, which would normally steer the optimisations offered by desktop developers. Short-cuts are golden, here.
If you go for:
if ((condition1 && (condition2 || condition3)) || (condition2 && condition3))
then you probably have the best chance, without depending on any further information, of getting the best machine code out of the compiler. It's possible, in assembly, to do things like have the second evaluation of condition2 branch back to the first evaluation of condition3 to reduce code size, but there's no reliable way to express this in C.
If you know that you will usually fail the test, and you know which two conditions usually cause that, then you might prefer to write:
if ((rare1 || rare2) && (common3 || (rare1 && rare2)))
but there's still a fairly good chance the compiler will completely rearrange that and use its own shortcut arrangement.
You might like to annotate things with __builtin_expect() or _Rarely() or whatever your compiler provides to indicate the likely outcome of a condition.
However, what's far more likely to meaningfully improve performance is recognising any common factors between the conditions or any way in which the conditions can be tested in a way that simplifies the overall test.
For example, if the tests are simple then in assembly you could almost certainly do some basic trickery with carry to accumulate the conditions quickly. Porting that back to C is sometimes viable.
You seem wiling to evaluate all the conditions, as you proposed such a solution yourself in your question. If the conditions are very complex formulas that take many CPU cycles to compute (like on the order of hundreds of milliseconds), then you may consider evaluating all three conditions simultaneously with threads to get a speed-up. Something like:
pthread_create(&t1, detached, eval_condition1, &status);
pthread_create(&t2, detached, eval_condition2, &status);
pthread_create(&t3, detached, eval_condition3, &status);
pthread_mutex_lock(&status.lock);
while (status.trues < 2 && status.falses < 2) {
pthread_cond_wait(&status.cond, &status.lock);
}
pthread_mutex_unlock(&status.lock);
if (status.trues > 1) {
/* do something */
}
Whether this gives you a speed up depends on how expensive it is to compute the conditions. The compute time has to dominate the thread creation and synchronization overheads.
Try this one:
unsigned char i;
i = condition1;
i += condition2;
i += condition3;
if (i & (unsigned char)0x02)
{
/*
At least 2 conditions are True
0b00 - 0 conditions are true
0b01 - 1 conditions are true
0b11 - 3 conditions are true
0b10 - 2 conditions are true
So, Checking 2nd LS bit is good enough.
*/
}

Which of the following would be more efficient?

In C:
Lets say function "Myfuny()" has 50 line of codes in which other smaller functions also get called. Which one of the following code would be more efficient?
void myfunction(long *a, long *b);
int i;
for(i=0;i<8;i++)
myfunction(&a, &b);
or
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
any help would be appreciated.
That's premature optimization, you just shouldn't care...
Now, from a code maintenance point of view the first form (with the loop) is definitely better.
From a run-time point of view and if the function is inline and defined in the same compilation unit, and with a compiler that does not unroll the loop itself, and if code is already in instruction cache (I don't know for moon phases, I still believe it shouldn't have any noticable effect) the second one may be marginally fastest.
As you can see, there is many conditions for it to be fastest, so you shouldn't do that. There is probably many other parameters to optimize in your program that would have a much greater effect for code speed than this one. Any change that would affect algorithmic complexity of the program will have a much greater effect. More generally speaking any code change that does not affect algorithmic complexity is probably premature optimization.
If you really want to be sure, measure. On x86 you can use the kind of trick I used in this question to get a fairly accurate measure. The trick is to read a processor register that count the number of cycles spent. The question also illustrate how code optimization questions can become tricky, even for very simple problems.
I'd assume the compiler will translate the first variant into the second.
The first. Any have half-decent compiler will optimize that for you. It's easier to read/understand and easier to write.
Secondly, write first, optimize second. Even if your compiler was completely brain dead and retarded, it at best would only save you a few nano/ms seconds on a modern CPU. Chances are there are bigger bottlenecks in your applications that could/should be optimized first.
It depends on so many things your best bet is to do it both ways and measure.
It would take less (of your) time to write out the for loop. I'd also say it's clearer to read with the loop. It would probably save a few instructions to write them out, but with modern processors and compilers it may amount to exactly the same result...
The first. It is easier to read.
First, are you sure you have a code execution performance problem? If you don't, then you're talking about making your code less readable and writable for no reason at all.
Second, have you profiled your program to see if this is in a place where it will take a significant amount of time? Humans are very bad at guessing the hot spots in programs, and without profiling you're likely to spend time and effort fiddling with things that don't make a difference.
Third, are you going to check the assembler code produced to see if there's a difference? If you're using an optimizing compiler with optimizations on, it's likely to produce what it sees fit for either. If you aren't, and you have a performance problem, get a better computer or turn on more optimizations.
Fourth, if there is a difference, are you going to test both ways to see which is better? On at least a representative sample of the systems your users will be running on?
And, to give you my best answer to which is more efficient: it depends. If they're in fact compiled to different code, the unrolled version might be faster because it doesn't have the loop overhead (which includes a conditional branch), and the rolled-up version might be faster because it's shorter code and will work better in the instruction cache. The usual wisdom was to unroll, but I once sped up a long-running section by rolling the execution up as tightly as I could.
On modern processors the size of compiled code becomes very importand. If this loop could run entirly from processor's cache it would be the fastest solution. As n8wrl said test yourself.
I created a short test for this, with surprising results. At least for me, anyway, I would've thought it was the other way round.
So, I wrote two versions of a program iterating over a function nothing(), that did nothing interesting (inc on a variable).
The first used proper loops (a million iterations of 1000 iterations, two nested fors), the second one did a million iterations of 1000 consecutive calls to nothing().
I used the time command to measure. The version with the proper loop took about 3.5 seconds on average, and the consecutive calling version took about 2.5 seconds on average.
I then tried to compile with optimization flags, but gcc detected that the program did essentially nothing and execution was instantaneous on both versions =P. Didn't bother fixing that.
Edit: if you were actually thinking of writing 8 consecutive calls in your code, please don't. Remember the famous quote: "Programs must be written for people to read, and only incidentally for machines to execute.".
Also note that my tests did nothing except nothing() (=P) and are no proper benchmarks to consider in any actual program.
Loop unrolling can make execution faster (otherwise Duff's Device wouldn't have been invented), but that's a function of so many variables (processor, cache size, compiler settings, what myfunction is actually doing, etc.) that you can't rely on it to always be true, or for whatever improvement to be worth the cost in readability and maintainability. The only way to know for sure if it makes a difference for your particular platform is to code up both versions and profile them.
Depending on what myfunction actually does, the difference could be so far down in the noise as to be undetectable.
This kind of micro-optimization should only be done if all of the following are true:
You're failing to meet a hard performance requirement;
You've already picked the proper algorithm and data structure for the problem at hand (e.g., in the average case a poorly optimized Quicksort will beat the pants off of a highly optimized bubble sort, and in the worst case they'll be equally bad);
You're compiling with the highest level of optimization that the compiler offers;
How much does myFunction(long *a, long *b) do?
If it does much more than *a = *b + 1; the cost of calling the function can be so small compared to what goes on inside the function that you are really focussing in the wrong place.
On the other hand, in the overall picture of your application program, what percent of time is spent in these 8 calls? If it's not very much, then it won't make much difference no matter how tightly you optimize it.
As others say, profile, but that's not necessarily as simple as it sounds. Here's the method I and some others use.

C syntax or binary optimized syntax?

Let's take a simple example of two lines supposedly doing the same thing:
if (value >= 128 || value < 0)
...
or
if (value & ~ 127)
...
Say 'If's are costly in a loop of thousands of iterations, is it better to keep with the traditional C syntax or better to find a binary optimized one if possible?
I would use first statement with traditional syntax as it is more readable.
It's possible to break the eyes with the second statement.
Care about programmers who will use the code after you.
In 99 cases out of 100, do the one which is more readable and expresses your intent better.
In theory, compilers will do this sort of optimization for you. In practice, they might not. This particular example is a little bit subtle, because the two are not equivalent, unless you make some assumptions about value and on whether or not signed arithmetic is 2's complement on your target platform.
Use whichever you find more readable. If and when you have evidence that the performance of this particular test is critical, use whatever gives you the best performance. Personally, I would probably write:
if ((unsigned int)value >= 96U)
because that's more intuitive to me, and more likely to get handled well by most compilers I've worked with.
It depends on how/where check is. If the check is being done once during program start up to check the command line parameter, then the performance issue is completely moot and you should use whatever is more natural.
On the other hand, if the check was inside some inner loop that is happening millions of times a second, then it may matter. But don't assume one will be better; you should create both versions and time them to see if there is any measurable difference between the two.

Micro-optimizations in C, which ones are there? Is there anyone really useful? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I understand most of the micro-optimizations out there but are they really useful?
Exempli gratia: does doing ++i instead of i++, or while(1) or for(;;) really result in performance improvements (either in memory fingerprint or CPU cycles)?
So the question is, what micro-optimizations can be done in C? Are they really useful?
You should rely on your compiler to optimise this stuff. Concentrate on using appropriate algorithms and writing reliable, readable and maintainable code.
The day tclhttpd, a webserver written in Tcl, one of the slowest scripting language, managed to outperform Apache, a webserver written in C, one of the supposedly fastest compiled language, was the day I was convinced that micro-optimizations significantly pales in comparison to using a faster algorithm/technique*.
Never worry about micro-optimizations until you can prove in a debugger that it is the problem. Even then, I would recommend first coming here to SO and ask if it is a good idea hoping someone would convince you not to do it.
It is counter-intuitive but very often code, especially tight nested loops or recursion, are optimized by adding code rather than removing them. The gaming industry has come up with countless tricks to speed up nested loops using filters to avoid unnecessary processing. Those filters add significantly more instructions than the difference between i++ and ++i.
*note: We have learned a lot since then. The realization that a slow scripting language can outperform compiled machine code because spawning threads is expensive led to the developments of lighttpd, NginX and Apache2.
There's a difference, I think, between a micro-optimization, a trick, and alternative means of doing something. It can be a micro-optimization to use ++i instead of i++, though I would think of it as merely avoiding a pessimization, because when you pre-increment (or decrement) the compiler need not insert code to keep track of the current value of the variable for use in the expression. If using pre-increment/decrement doesn't change the semantics of the expression, then you should use it and avoid the overhead.
A trick, on the other hand, is code that uses a non-obvious mechanism to achieve a result faster than a straight-forward mechanism would. Tricks should be avoided unless absolutely needed. Gaining a small percentage of speed-up is generally not worth the damage to code readability unless that small percentage reflects a meaningful amount of time. Extremely long-running programs, especially calculation-heavy ones, or real-time programs are often candidates for tricks because the amount of time saved may be necessary to meet the systems performance goals. Tricks should be clearly documented if used.
Alternatives, are just that. There may be no performance gain or little; they just represent two different ways of expressing the same intent. The compiler may even produce the same code. In this case, choose the most readable expression. I would say to do so even if it results in some performance loss (though see the preceding paragraph).
I think you do not need to think about these micro-optimizations because most of them is done by compiler. These things can only make code more difficult to read.
Remember, [edited] premature [/edited] optimization is an evil.
To be honest, that question, while valid, is not relevant today - why?
Compiler writers are a lot more smarter than they were 20 years ago, rewind back in time, then these optimizations would have been very relevant, we were all working with old 80286/386 processors, and coders would often resort to tricks to squeeze even more bytes out of the compiled code.
Today, processors are too fast, compiler writers knows the intimate details of operand instructions to make every thing work, considering that there is pipe-lining, core processors, acres of RAM, remember, with a 80386 processor, there would be 4Mb RAM and if you're lucky, 8Mb was considered superior!!
The paradigm has shifted, it was about squeezing every byte out of compiled code, now it is more on programmer productivity and getting the release out the door much sooner.
The above I have stated the nature of the processor, and compilers, I was talking about the Intel 80x86 processor family, Borland/Microsoft compilers.
Hope this helps,
Best regards,
Tom.
If you can easily see that two different code sequences produce identical results, without making assumptions about the data other than what's present in the code, then the compiler can too, and generally will.
It's only when the transformation from one to the other is highly non-obvious or requires assuming something that you may know to be true but the compiler has no way to infer (eg. that an operation cannot overflow or that two pointers will never alias, even though they aren't declared with the restrict keyword) that you should spend time thinking about these things. Even then, the best thing to do is usually to find a way to inform the compiler about the assumptions that it can make.
If you do find specific cases where the compiler misses simple transformations, 99% of the time you should just file a bug against the compiler and get on with working on more important things.
Keeping the fact that memory is the new disk in mind will likely improve your performance far more than applying any of those micro-optimizations.
For a slightly more pragmatic take on the question of ++i vs. i++ (at least in a C++ context) see http://llvm.org/docs/CodingStandards.html#micro_preincrement.
If Chris Lattner says it, I've got to pay attention. ;-)
You would do better to consider every program you write primarily as a language in which you communicate your ideas, intentions and reasoning to other human beings who will have to bug-fix, reuse and understand it. They will spend more time on decoding garbled code than any compiler or runtime system will do executing it.
To summarise, say what you mean in the clearest way, using the common idioms of the language in question.
For these specific examples in C, for(;;) is the idiom for an infinite loop and "i++" is the usual idiom for "add one to i" unless you use the value in an expression, in which case it depends whether the value with the clearest meaning is the one before or after the increment.
Here's real optimization, in my experience.
Someone on SO once remarked that micro-optimization was like "getting a haircut to lose weight". On American TV there is a show called "The Biggest Loser" where obese people compete to lose weight. If they were able to get their body weight down to a few grams, then getting a haircut would help.
Maybe that's overstating the analogy to micro-optimization, because I have seen (and written) code where micro-optimization actually did make a difference, but when starting off there is a lot more to be gained by simply not solving problems you don't have.
x ^= y
y ^= x
x ^= y
++i should be prefered over i++ for situations where you don't use the return value because it better represents the semantics of what you are trying to do (increment i) rather than any possible optimisation (it might be slightly faster, and is probably not worse).
Generally, loops that count towards zero are faster than loops that count towards some other number. I can imagine a situation where the compiler can't make this optimization for you, but you can make it yourself.
Say that you have and array of length x, where x is some very big number, and that you need to perform some operation on each element of x. Further, let's say that you don't care what order these operations occur in. You might do this...
int i;
for (i = 0; i < x; i++)
doStuff(array[i]);
But, you could get a little optimization by doing it this way instead -
int i;
for (i = x-1; i != 0; i--)
{
doStuff(array[i]);
}
doStuff(array[0]);
The compiler doesn't do it for you because it can't assume that order is unimportant.
MaR's example code is better. Consider this, assuming doStuff() returns an int:
int i = x;
while (i != 0)
{
--i;
printf("%d\n",doStuff(array[i]));
}
This is ok as long as printing the array contents in reverse order is acceptable, but the compiler can't decide that for you.
This being an optimization is hardware dependent. From what I remember about writing assembler (many, many years ago), counting up rather than counting down to zero requires an extra machine instruction each time you go through the loop.
If your test is something like (x < y), then evaluation of the test goes something like this:
subtract y from x, storing the result in some register r1
test r1, to set the n and z flags
branch based on the values of the n and z flags
If your test is ( x != 0), you can do this:
test x, to set the z flag
branch based on the value of the z flag
You get to skip a subtract instruction for each iteration.
There are architectures where you can have the subtract instruction set the flags based on the result of the subtraction, but I'm pretty sure x86 isn't one of them, so most of us aren't using compilers that have access to such a machine instruction.

Resources