I have a loop with a switch inside that looks something like this (but much more complex).
for(int i = 0; i < N; i += inc)
{
v = 0.0;
switch(ConstVal)
{
case 2: v += A[i];
case 1: v += A[i+k];
case 0: v += A[i+j];
}
// do some stuff with v
}
ConstVal is unknown at compile time but fixed during the initialization routine. Is there any way to remove the switch statement without compiling multiple variants of the for loop? Given that x86 has indirect branching, there should be a simple way to inline assembly to jump to the case I want, rather than back to the top of the loop each iteration. How would you do this (in gcc)? Finally, can this be done without interfering with the compiler's optimization analysis. I'm already manually unrolling loops, but I'm sure there are lots more optimizations going on which I don't want to break.
It's my understanding that the Julia meta-programming feature gives you access to the parser and abstract syntax tree. In combination with JIT, you can resolve this kind of problem. I would think there would be some reasonable workaround in C even without semantics for indirect branch. Note that Duff's device is not a solution since I want to return to the same case statement on each loop iteration. This issue comes up frequently.
EDIT
I discovered there is no conditional indirect branch x86 instruction. Further, gcc inline assembly only allows for fixed labels. And yet, using gcc extensions, this can still be done. See, for example, https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html#Labels-as-Values.
This is how. In my code, it was difficult to determine if there was any performance difference, but on another machine or with a much smaller and simpler loop, it may make a difference.
void *forswitch;
switch(ConstVal)
{
case 2: forswitch = &&C; break;
case 1: forswitch = &&B; break;
case 0: forswitch = &&A; break;
}
void *_forswitch[] = {forswitch, &&END_FOR_SWITCH};
i = 0;
{
v = 0.0;
C: v += _A[i];
B: v += _A[i+k];
A: v += _A[i+j];
// do some stuff with v
i += inc;
forswitch = _forswitch[i==N];
//forswitch = (i<N)? forswitch: &&END_FOR_SWITCH;
goto *forswitch;
}
END_FOR_SWITCH:
return;
I've replaced the for loop with my own implementation based on the gcc extension that gives access to machine level indirect branching. There are a couple of ways to do this. The first is to index an array which jumps to the conditional start of the loop or the end of the loop depending on the loop index. The other way (commented) is to conditionally set the branch register each time. The compiler should replace any branch with a conditional move (CMOV).
There are a number of obvious issues with this solution. (1) It's not portable. (2) By implementing the loop myself, it's not only harder to understand the code but may interfere with compiler optimizations (such as automatic loop unrolling). (3) The compiler can't jointly optimize the entire switch statement even though there are no breaks because it has no knowledge at compile time of which statements will actually be executed. However, it may be able to cleverly re-organize the switch in a manner similar to how others have pointed out in some of the responses below. By manually implementing the switch myself (in combination with the for loop), I'm making it much more difficult for the compiler to make any such optimization since by removing the switch semantics, my intent is obscured by the optimizations.
Nevertheless, if it made a significant performance improvement, I still think this would be better than having multiple copies of the code. With macros, the non-portable extensions could probably be compiled conditionally; this could basically be made to look like a normal loop.
EDIT 2
I've found a much better solution which is more portable and more effective. When you have a situation where there are a small number of possible run-time determined options, you can make a wrapper around the optimized function, fix all the run-time constants, and then inline the function for each copy of constants. If there is only a single constant, you can use a lookup table of function pointers, each of which sets the constant and inlines the function. If you have a more complex situation, you'll need some if-elseif-else control structure. One of the functions can be left with all free variables so there is no loss of generality. I'm thinking of this as being a sort of compile-time closure. The compiler is doing all the hard work without any messy macros or otherwise duplicate code to maintain.
In my code, this resulted is a 10% to 20% performance increase on already significantly optimized code (due to hard-coding of various constants and not really anything to do with the switch itself). In my toy example, the change would look something like this.
inline void __foo(const int ConstVal)
{
for(int i = 0; i < N; i += inc)
{
v = 0.0;
switch(ConstVal)
{
case 2: v += A[i];
case 1: v += A[i+k];
case 0: v += A[i+j];
}
// do some stuff with v
}
}
void foo()
{
// this could be done with a lookup table
switch(ConstVal)
{
case2: __foo(2); break;
case1: __foo(1); break;
case0: __foo(0); break;
}
}
By inlining __foo, the compiler will eliminate the switch as well as any other constants that you pass along. You will, of course, end up with more compiled code, but for a small optimized routine, this shouldn't be a big deal.
No, I don't see any way to optimize the switch statement away. Besides, it is not that expensive. Because there are no break statements, the switch has a "fall thru" tendency. It translates to:
switch(ConstVal)
{
case 2: v= A[i] + A[i+k] + A[i+j]; break;
case 1: v= A[i+k] + A[i+j]; break;
case 0: v= A[i+j]; break;
}
// do some stuff with v
and I don't see how to remove the dependency on ConstVal.
You could make a switch before the loop with 3 loops, one for for each value of ConstVal but that will surely look like ugly code, depending on what do some stuff with v does.
When do you know ConstVal, and how often does it change? If you can recompile a small routine and re-link the whole thing whenever ConstVal changes, that will solve your problem.
Having said that, do you know this is a problem?
Is that switch responsible for 10% or more of execution time?
It is very common for people to focus on non-problems.
They know they should "profile first" but they don't actually do it.
(It's a case of "ready-fire-aim" :)
A method many people rely on is this.
Is there any way to remove the switch statement without compiling multiple variants of the for loop?
In general it's better either to make different variants or just leave as is. But in this case maybe we can invent some trick. Kind of this
switch (constVal) {
case 2:
A1 = A;
B1 = A + k;
C1 = A + j;
break;
case 1:
A1 = big_zero_array;
B1 = A + k;
C1 = A + j;
break;
case 0:
A1 = B1 = big_zero_array;
C1 = A + j
break;
}
for (int i = 0; i < N; i += inc) {
v = A1[i] + B1[i] + C1[i];
//....
}
Still this thing requires additional memory and may be even slower under some circumstances.
Related
Is it possible for a c compiler to transform an 'if-elseif' bloc into a 'switch' bloc to optimize the code ?
The following code :
if ( a == 1 ) {
bloc1
} else if ( a == 2 ) {
bloc2
} else if ( ... ) {
....
} else if ( a == n ) {
bloc n
}
during compilation is transformed to :
switch(a) {
case 1:
bloc1
break;
case 2:
bloc2
break;
...
case n:
blocn
break;
}
If it is possible every time it is feasible, is there again an advantage of 'if...elseif' blocs on 'switch' blocs ?
A compiler translates C code to machine code. On assembler level, there is no such thing as if, else or switch. There are just conditional branches and non-conditional branches.
So, no, the compiler will never do anything like what you are suggesting, because an if-else that can be replaced by a switch will already result in identical machine code, no matter which one you wrote. There is no middle "convert C to C" step.
For this reason, switch is kind of an obscure language feature, since it is in theory completely redundant. It is subjective whether a switch results in more readable or less readable code. It can do either, on a case-by-case basis.
Special case:
There exists a common optimization of switch statements that the compiler can do when all the case labels are adjacent integer constants, and that is to replace the whole switch with an array of function pointers. It can then simply use the switch condition as array index to determine which function to call. No comparisons needed. This is quite a radical optimization, because it removes numerous comparisons and thereby speeds up the code, improves branch prediction, etc.
The very same optimization is possible if you have a chain of if - else if, like so:
if(n==0)
{ }
else if(n==1)
{ }
else if(n==2)
{ }
...
This can also get optimized into an array of function pointers, just like if it were written as a switch.
A c compiler can make any optimisations it likes, so long as the program intent is not altered in any way.
Noting that a switch can only occur on integral types, the conversion to a switch in your case does appear to be optimal. Trust your optimiser to make such a transformation if appropriate. You can always inspect the output assembly to verify.
In case you've never heard about the "as-if" rule: here it is. That means, the compiler is allowed to migrate if to switch in certain occasions, yes.
switch should be used whenever possible; it gives the compiler more information about the type of data parsed and enables it to construct something like hash tables.
if should be used when switch cannot be used. That's as easy as it gets.
I have code that looks like this:
void foo(unsigned long k)
{
if (k & 1)
{
bar(k/2 + 1);
bar(k/2);
bar(k/2 + 1);
}
else
{
bar(k/2);
bar(k/2);
bar(k/2);
}
}
void bar(unsigned long k)
{
switch(k)
{
case default: special_default(); break;
case 1: specialbar1(); break;
case 2: specialbar2(); break;
<more cases>
case 16: specialbar16(); break;
}
}
The performance is much better when foo is called for an even value of k. Each of the specialbar#() methods uses several stack variables, the number of such variables increases sharply as k increases. To be clear specialbar#() makes use of about 3 * k local variables all of which are unsigned long long variables.
For example foo(32) executes about 15% faster than foo(31). I am using Visual Studio 2012 and the performance analysis assures me that two calls to specialbar16 and one call to specialbar15 takes considerably more work than three consecutive calls to specialbar16.
Is it possible that the compiler takes advantage of the three consecutive calls when k is even? That is, can it realize that the stack is essentially the same over the three consecutive calls for even k yet the same optimization is not possible for odd k?
Is it possible that the compiler takes advantage of the three consecutive calls when k is even? That is can it realize that the stack is essentially the same over the three consecutive calls for even k yet the same optimization is not possible for odd k?
This hardly seems worthy of an answer but, yes, that's entirely possible. The compiler may recognize that the same stack layout is required for each call due it it being the same method each time, and thus avoid the whole stack setup/teardown for each method call. It is in this case probably also inlining the method call - the code is generated in place in the caller.
Most likely similar optimization could be performed for the other case as well, though optimization is tricky and there are sometimes subtle reasons why a compiler won't be able to perform it.
You're foo function performs extra logic when k is odd (k/2 + 1) the + 1.
To answer your specific question, can repeated calls improve performance. Yes it can when the parameters are the same the tranches within the function are the same and this allows for "branch prediction" to work optimally.
Compare the two:
if (strstr(a, "earth")) // A1
return x;
if (strstr(a, "ear")) // A2
return y;
and
if (strstr(a, "earth")) // B1
return x;
else if (strstr(a, "ear")) // B2
return y;
Personally, I feel that else is redundant and prevent CPU from branch prediction.
In the first one, when executing A1, it's possible to pre-decode A2. And in the second one, it will not interpret B2 until B1 is evaluated to false.
I found a lot of (maybe most of?) sources using the latter form.
Though, the latter form looks better to understand, because it's not so obviously that it will call return y only if a =~ /ear(?!th)/ without the else clause.
Your compiler probably knows that both these examples mean exactly the same thing. CPU branch prediction doesn't come into it.
I usually would choose the first option for symmetry.
(The following answers the original version of the question.)
Do you realize that the two code snippets are NOT semantically equivalent???
Consider what happens if a is "earth".
The first snippet calls foo() and then bar().
The second snippet calls foo() and skips the bar() call.
And this explains why the generated machine code is different. It has to be to implement the different semantics of the respective code fragments!
Personally, I feel that else is redundant ...
Unfortunately, your feeling is incorrect.
Lesson - write your code simply and clearly and leave optimization to the compiler ... which is going to do a far more accurate job than you can achieve.
FOLLOWUP
The snippets in the updated version of the question are now semantically identical, and the else is redundant. However:
any half decent optimizing compiler will generate identical code for the two snippets, and
it is a matter of opinion (i.e. subjective) which of the snippets is easier to understand.
Use else if to state your intentions clearly. Code is meant to be read by humans.
Let the compiler optimize this, and don't worry about optimization until your code is 1) working 2) crystal clear 3) profiled (do this in that order). When doing step 3, you'll notice that the bottlenecks are not where you supposed they would be.
Any attempt to control branch prediction or whatever low level stuff is silly: compilers are very good at optimizing and they use sophisticated methods to yield a fast code on your particular machine.
Look at output from LLVM based compilers to see what I mean: sometimes you can't even remotely understand what it does.
usually it's better to use the second way if you want to test exactly the condition for a, for the exact solution, to reduce the options for the var or const "a". if you write two separate if's you can get 2 different solutions.
for example in your situation with the exact conditions you have there let's say a= -2
A: if (a < 0)
return x; // if -2 is less than 0 will return x and it stops.
else if (a < 100)
return y; //
B: if (a < 0)
return x; // -2 is less than 0 so it will return x and passes to the next if statement;
if (a < 100)
return y; // -2 is also less than 100 and it will return y too
Why not simply write
char* str;
strstr(a, "ear")
if (str != NULL)
{
foo();
if(strstr(str, "earth") != NULL)
{
bar();
}
}
Many times I need to do things TWICE in a for loop. Simply I can set up a for loop with an iterator and go through it twice:
for (i = 0; i < 2; i++)
{
// Do stuff
}
Now I am interested in doing this as SIMPLY as I can, perhaps without an initializer or iterator? Are there any other, really simple and elegant, ways of achieving this?
This is elegant because it looks like a triangle; and triangles are elegant.
i = 0;
here: dostuff();
i++; if ( i == 1 ) goto here;
Encapsulate it in a function and call it twice.
void do_stuff() {
// Do Stuff
}
// .....
do_stuff();
do_stuff();
Note: if you use variables or parameters of the enclosing function in the stuff logic, you can pass them as arguments to the extracted do_stuff function.
If its only twice, and you want to avoid a loop, just write the darn thing twice.
statement1;
statement1; // (again)
If the loop is too verbose for you, you can also define an alias for it:
#define TWICE for (int _index = 0; _index < 2; _index++)
This would result into that code:
TWICE {
// Do Stuff
}
// or
TWICE
func();
I would only recommend to use this macro if you have to do this very often, I think else the plain for-loop is more readable.
Unfortunately, this is not for C, but for C++ only, but does exactly what you want:
Just include the header, and you can write something like this:
10 times {
// Do stuff
}
I'll try to rewrite it for C as well.
So, after some time, here's an approach that enables you to write the following in pure C:
2 times {
do_something()
}
Example:
You'll have to include this little thing as a simple header file (I always called the file extension.h). Then, you'll be able to write programs in the style of:
#include<stdio.h>
#include"extension.h"
int main(int argc, char** argv){
3 times printf("Hello.\n");
3 times printf("Score: 0 : %d\n", _);
2 times {
printf("Counting: ");
9 times printf("%d ", _);
printf("\n");
}
5 times {
printf("Counting up to %d: ", _);
_ times printf("%d ", _);
printf("\n");
}
return 0;
}
Features:
Simple notation of simple loops (in the style depicted above)
Counter is implicitly stored in a variable called _ (a simple underscore).
Nesting of loops allowed.
Restrictions (and how to (partially) circumvent them):
Works only for a certain number of loops (which is - "of course" - reasonable, since you only would want to use such a thing for "small" loops). Current implementation supports a maximum of 18 iterations (higher values result in undefined behaviour). Can be adjusted in header file by changing the size of array _A.
Only a certain nesting depth is allowed. Current implementation supports a nesting depth of 10. Can be adjusted by redefining the macro _Y.
Explanation:
You can see the full (=de-obfuscated) source-code here. Let's say we want to allow up to 18 loops.
Retrieving upper iteration bound: The basic idea is to have an array of chars that are initially all set to 0 (this is the array counterarray). If we issue a call to e.g. 2 times {do_it;}, the macro times shall set the second element of counterarray to 1 (i.e. counterarray[2] = 1). In C, it is possible to swap index and array name in such an assignment, so we can write 2[counterarray] = 1 to acchieve the same. This is exactly what the macro times does as first step. Then, we can later scan the array counterarray until we find an element that is not 0, but 1. The corresponding index is then the upper iteration bound. It is stored in variable searcher. Since we want to support nesting, we have to store the upper bound for each nesting depth separately, this is done by searchermax[depth]=searcher+1.
Adjusting current nesting depth: As said, we want to support nesting of loops, so we have to keep track of the current nesting depth (done in the variable depth). We increment it by one if we start such a loop.
The actual counter variable: We have a "variable" called _ that implicitly gets assigned the current counter. In fact, we store one counter for each nesting depth (all stored in the array counter. Then, _ is just another macro that retrieves the proper counter for the current nesting depth from this array.
The actual for loop: We take the for loop into parts:
We initialize the counter for the current nesting depth to 0 (done by counter[depth] = 0).
The iteration step is the most complicated part: We have to check if the loop at the current nesting depth has reached its end. If so, we have do update the nesting depth accordingly. If not, we have to increment the current nesting depth's counter by 1. The variable lastloop is 1 if this is the last iteration, otherwise 0, and we adjust the current nesting depth accordingly. The main problem here is that we have to write this as a sequence of expressions, all separated by commata, which requires us to write all these conditions in a very non-straight-forward way.
The "increment step" of the for loop consists of only one assignment, that increments the appropriate counter (i.e. the element of counter of the proper nesting depth) and assigns this value to our "counter variable" _.
What about this??
void DostuffFunction(){}
for (unsigned i = 0; i < 2; ++i, DostuffFunction());
Regards,
Pablo.
What abelenky said.
And if your { // Do stuff } is multi-line, make it a function, and call that function -- twice.
Many people suggest writing out the code twice, which is fine if the code is short. There is, however, a size of code block which would be awkward to copy but is not large enough to merit its own function (especially if that function would need an excessive number of parameters). My own normal idiom to run a loop 'n' times is
i = number_of_reps;
do
{
... whatever
} while(--i);
In some measure because I'm frequently coding for an embedded system where the up-counting loop is often inefficient enough to matter, and in some measure because it's easy to see the number of repetitions. Running things twice is a bit awkward because the most efficient coding on my target system
bit rep_flag;
rep_flag = 0;
do
{
...
} while(rep_flag ^= 1); /* Note: if loop runs to completion, leaves rep_flag clear */
doesn't read terribly well. Using a numeric counter suggests the number of reps can be varied arbitrarily, which in many instances won't be the case. Still, a numeric counter is probably the best bet.
As Edsger W. Dijkstra himself put it : "two or more, use a for". No need to be any simpler.
Another attempt:
for(i=2;i--;) /* Do stuff */
This solution has many benefits:
Shortest form possible, I claim (13 chars)
Still, readable
Includes initialization
The amount of repeats ("2") is visible in the code
Can be used as a toggle (1 or 0) inside the body e.g. for alternation
Works with single instruction, instruction body or function call
Flexible (doesn't have to be used only for "doing twice")
Dijkstra compliant ;-)
From comment:
for (i=2; i--; "Do stuff");
Use function:
func();
func();
Or use macro (not recommended):
#define DO_IT_TWICE(A) A; A
DO_IT_TWICE({ x+=cos(123); func(x); })
If your compiler supports this just put the declaration inside the for statement:
for (unsigned i = 0; i < 2; ++i)
{
// Do stuff
}
This is as elegant and efficient as it can be. Modern compilers can do loop unrolling and all that stuff, trust them. If you don't trust them, check the assembler.
And it has one little advantage to all other solutions, for everybody it just reads, "do it twice".
Assuming C++0x lambda support:
template <typename T> void twice(T t)
{
t();
t();
}
twice([](){ /*insert code here*/ });
Or:
twice([]()
{
/*insert code here*/
});
Which doesn't help you since you wanted it for C.
Good rule: three or more, do a for.
I think I read that in Code Complete, but I could be wrong. So in your case you don't need a for loop.
This is the shortest possible without preprocessor/template/duplication tricks:
for(int i=2; i--; ) /*do stuff*/;
Note that the decrement happens once right at the beginning, which is why this will loop precisely twice with the indices 1 and 0 as requested.
Alternatively you can write
for(int i=2; i--; /*do stuff*/) ;
But that's purely a difference of taste.
If what you are doing is somewhat complicated wrap it in a function and call that function twice? (This depends on how many local variables your do stuff code relies on).
You could do something like
void do_stuff(int i){
// do stuff
}
do_stuff(0);
do_stuff(1);
But this may get extremely ugly if you are working on a whole bunch of local variables.
//dostuff
stuff;
//dostuff (Attention I am doing the same stuff for the :**2nd** time)
stuff;
First, use a comment
/* Do the following stuff twice */
then,
1) use the for loop
2) write the statement twice, or
3) write a function and call the function twice
do not use macros, as earlier stated, macros are evil.
(My answer's almost a triangle)
What is elegance? How do you measure it? Is someone paying you to be elegant? If so how do they determine the dollar-to-elegance conversion?
When I ask myself, "how should this be written," I consider the priorities of the person paying me. If I'm being paid to write fast code, control-c, control-v, done. If I'm being paid to write code fast, well.. same thing. If I'm being paid to write code that occupies the smallest amount of space on the screen, I short the stock of my employer.
jump instruction is pretty slow,so if you write the lines one after the other,it would work faster,than writing a loop. but modern compilers are very,very smart and the optimizations are great (if they are allowed,of course). if you have turned on your compiler's optimizations,you don't care the way,you write it - with loop or not (:
EDIT : http://en.wikipedia.org/wiki/compiler_optimizations just take a look (:
Close to your example, elegant and efficient:
for (i = 2; i; --i)
{
/* Do stuff */
}
Here's why I'd recommend that approach:
It initializes the iterator to the number of iterations, which makes intuitive sense.
It uses decrement over increment so that the loop test expression is a comparison to zero (the "i;" can be interpreted as "is i true?" which in C means "is i non-zero"), which may optimize better on certain architectures.
It uses pre-decrement as opposed to post-decrement in the counting expression for the same reason (may optimize better).
It uses a for loop instead of do/while or goto or XOR or switch or macro or any other trick approach because readability and maintainability are more elegant and important than clever hacks.
It doesn't require you to duplicate the code for "Do stuff" so that you can avoid a loop. Duplicated code is an abomination and a maintenance nightmare.
If "Do stuff" is lengthy, move it into a function and give the compiler permission to inline it if beneficial. Then call the function from within the for loop.
I like Chris Case's solution (up here), but C language doesn't have default parameters.
My solution:
bool cy = false;
do {
// Do stuff twice
} while (cy = !cy);
If you want, you could do different things in the two cycle by checking the boolean variable (maybe by ternary operator).
void loopTwice (bool first = true)
{
// Recursion is your friend
if (first) {loopTwice(false);}
// Do Stuff
...
}
I'm sure there's a more elegant way, but this is simple to read, and pretty simply to write. There might even be a way to eliminate the bool parameter, but this is what I came up with in 20 seconds.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:
In a loop, for example the following snippet:
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
If it checks both, wouldn't:
int i = 0;
while (i != 10) {
printf("%d\n", i);
i++;
}
be more efficient?
Thanks!
Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.
One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).
Your 2 example loops do not have the same behavior:
int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
printf("%d\n", i);
i++;
}
And,
int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
printf("%d\n", i);
i++;
}
To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.
This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.
Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.
Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?
Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.
The operators <= and < are a single instruction in assembly, there should be no performance difference.
Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:
int i = 10;
while (i != 0)
{
printf("%d\n", i);
i--;
}
Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).
Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)
The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.
And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit.
If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.
It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.
disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).
Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.
typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.
// Case I
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
printf("%d\n", i);
i++;
}
// Case II
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Case I code take more space but fast and Case II code is take less space but slow compare to Case I code.
Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.
And your both code are same.