I've noticed that gcc12 does not optimize these two functions to the same code (with -O3):
int x = 0;
void f(bool a)
{
if (a) {
++x;
}
}
void f2(bool a)
{
x += a;
}
Basically no transformation is done. That can be seen here: https://godbolt.org/z/1G3n4fxEK
Optimizing f to the code in f2 seems to be trivial and no jump would be needed anymore. However, I'm curious if there's a reason why this is not done by gcc? Is it somehow still slower or something? I would assume it's never slower and sometimes faster, but I might be wrong.
Thanks
Such a substitution would be incorrect in a scenario where one thread calls f(1) while another thread calls f(0). If x is never actually accessed outside the first thread, there would be no race condition in the code as written, but the substitution would create one. If x is initially 1, nothing would prevent the code from being processed as:
thread 1: read x (yields 1)
thread 2: read x (yields 1)
thread 1: write 2
thread 2: write 1
This would cause x to be left holding the value 1 when thread 2 has just written the value 2. Worse than that, if the function was invoked within a context like:
x = 1;
f(1);
if (x != 1)
launch_nuclear_missiles_if_x_is_1_and_otherwise_make_coffee();
a compiler might recognize that x will always equal 2 following the return from f(1), and thus make the function call unconditional.
To be sure, such substitution would rarely cause problems in real-world situations, but the Standard explicitly forbids transformations that could create race conditions where none would exist in the source code as written.
I would have hoped the compiler would have changed f2 to f. Reading and writing memory may require slow transactions to acquire a copy of the memory location, and update other bus controllers about the state of that location (Invalid -> Shared -> Modified).
Jumping around an update based on a register value is quite cheap; especially with the efficacy of branch predictors.
A much simpler reason why that optimization isn't done is because bool is nothing but an alias for int. Specifically, nothing stops you from passing an arbitrary integer to your function:
int v = 5;
f2(*(bool *)&v);
// x = 5 here
Say I have a tight loop in C, within which I use the value of a global variable to do some arithmetics, e.g.
double c;
// ... initialize c somehow ...
double f(double*a, int n) {
double sum = 0.0;
int i;
for (i = 0; i < n; i++) {
sum += a[i]*c;
}
return sum;
}
with c the global variable. Is c "read anew from global scope" in each loop iteration? After all, it could've been changed by some other thread executing some other function, right? Hence would the code be faster by taking a local (function stack) copy of c prior to the loop and only use this copy?
double f(double*a, int n) {
double sum = 0.0;
int i;
double c_cp = c;
for (i = 0; i < n; i++) {
sum += a[i]*c_cp;
}
return sum;
}
Though I haven't specified how c is initialized, let's assume it's done in some way such that the value is unknown at compile time. Also, c is really a constant throughout runtime, i.e. I as the programmer knows that its value won't change. Can I let the compiler in on this information, e.g. using static double c in the global scope? Does this change the a[i]*c vs. a[i]*c_cp question?
My own research
Reading e.g. the "Global variables" section of this, it seems clear that taking a local copy of the global variable is the way to go. However, they want to update the value of the global variable, whereas I only ever want to read its value.
Using godbolt I fail to notice any real difference in the assembly for both c vs. c_cp and double c vs. static double c.
Any decently smart compiler will optimize your code so it will behave as your second code snippet. Using static won't change much, but if you want to ensure read on each iteration then use volatile.
Great point there about changes from a different thread. Compiler will maintain integrity of your code as far as single-threaded execution goes. That means that it can reorder your code, skip something, add something -- as long as the end result is still the same.
With multiple threads it is your job to ensure that things still happen in a specific order, not just that the end result is right. The way to ensure that are memory barriers. It's a fun topic to read, but one that is best avoided unless you're an expert.
Once everything translated to machine code, you will get no difference whatsoever. If c is global, any access to c will reference the address of c or most probably, in a tight loop c will be kept in a register, or in the worst case the L1 cache.
On a Linux machine you can easily generate the assembly and examine the resultant code.
You can also run benchmarks.
Is one of these loops quicker than the other?
I've always used #2 thinking it was quicker to compare against zero as opposed to comparing against a value in assembly since the CMP instruction would be simpler to execute but checking some ARM manuals I don't see anything to confirm this. Does it depend on the instruction set and processor you're using? Is it ever true?
//#1
while(1)
{
static uint8_t counter = 0;
counter++;
if(counter == 4)
{
counter = 0;
//do something
}
}
//#2
while(1)
{
static uint8_t counter = 4;
counter--;
if(counter == 0)
{
counter = 4;
//do something
}
}
It's hard to tell. Focusing on the release mode build, it largely depends on the context, and you aren't giving everything, especially the missing loop break condition makes it impossible to figure out.
Usually, if the number of iterations is an immediate value, the compiler will convert the loop construct to a fast counting down to zero one as long as there is no loop counter dependency inside the loop.
Anyway, on modern, superscalar architectures such as the Cortex-A series, a simple ALU instruction such as cmp will be well "hidden" and thus, won't cost an extra cycle most of the time.
What actually hurts the performance more is the static declaration of counter that automatically translates to memory RW. Avoid this if possible.
Further, if you simply want do something to run every fourth iteration, if ((counter & 3) == 0) could be the better solution that makes it possible to remove the counter resetting. And again, it all depends on the context (the length of "do something") which you didn't provide.
As a side note, local variables better be 32bit ones unless you have a good reason to declare them otherwise since anything less may translate to additional modulo related instructions such as uxtb, and, etc.
Counting down the loop counter to zero is a no brainer, but there are many more things to consider if you want the maximum performance.
My experience with C is relatively modest, and I lack good understanding of its compiled output on modern CPUs. The context: I'm working on image processing for an Android app. I have read that branch-free machine code is preferred for inner loops, so I'd like to know whether there could be a significant performance difference between something like this:
if (p) { double for loop, computing f() }
else if (q) { double for loop, computing g() }
else { double for loop, computing h() }
Versus the less verbose version which does the condition checking within the loop:
for (int i = 0; i < xRes; i++)
{
for (int j = 0; j < yRes; j++)
{
image[i][j] = p ? f() : (q ? g() : h());
}
}
In this code, p and q are expressions like mode == 3, where mode is passed into the function and never changed within it. I have three simple questions:
(1) Would the first, more verbose version compile to more efficient code than the second version?
(2) For the second version, would performance improve if I evaluate and store the results of p and q above the loop, so I can replace the boolean expressions in the loop with variables?
(3) Should I even be worried about this, or will branch prediction (or some other optimization) ensure the boolean expressions in the loop(s) are almost never evaluated anyway?
Finally, I'd be delighted if someone can say whether the answers to these 3 questions depend on the architecture. I'm interested in the main Android NDK platforms: ARM, MIPS, x86 etc. My thanks in advance!
It looks like the question was already well-answered here: the compiler probably performs loop unswitching, removing the conditional from the loop and automatically generating 3 copies of the loop, just like stark suggested. Moreover, from comments given there and above, it seems branch prediction works very well for loops like these.
Many times I need to do things TWICE in a for loop. Simply I can set up a for loop with an iterator and go through it twice:
for (i = 0; i < 2; i++)
{
// Do stuff
}
Now I am interested in doing this as SIMPLY as I can, perhaps without an initializer or iterator? Are there any other, really simple and elegant, ways of achieving this?
This is elegant because it looks like a triangle; and triangles are elegant.
i = 0;
here: dostuff();
i++; if ( i == 1 ) goto here;
Encapsulate it in a function and call it twice.
void do_stuff() {
// Do Stuff
}
// .....
do_stuff();
do_stuff();
Note: if you use variables or parameters of the enclosing function in the stuff logic, you can pass them as arguments to the extracted do_stuff function.
If its only twice, and you want to avoid a loop, just write the darn thing twice.
statement1;
statement1; // (again)
If the loop is too verbose for you, you can also define an alias for it:
#define TWICE for (int _index = 0; _index < 2; _index++)
This would result into that code:
TWICE {
// Do Stuff
}
// or
TWICE
func();
I would only recommend to use this macro if you have to do this very often, I think else the plain for-loop is more readable.
Unfortunately, this is not for C, but for C++ only, but does exactly what you want:
Just include the header, and you can write something like this:
10 times {
// Do stuff
}
I'll try to rewrite it for C as well.
So, after some time, here's an approach that enables you to write the following in pure C:
2 times {
do_something()
}
Example:
You'll have to include this little thing as a simple header file (I always called the file extension.h). Then, you'll be able to write programs in the style of:
#include<stdio.h>
#include"extension.h"
int main(int argc, char** argv){
3 times printf("Hello.\n");
3 times printf("Score: 0 : %d\n", _);
2 times {
printf("Counting: ");
9 times printf("%d ", _);
printf("\n");
}
5 times {
printf("Counting up to %d: ", _);
_ times printf("%d ", _);
printf("\n");
}
return 0;
}
Features:
Simple notation of simple loops (in the style depicted above)
Counter is implicitly stored in a variable called _ (a simple underscore).
Nesting of loops allowed.
Restrictions (and how to (partially) circumvent them):
Works only for a certain number of loops (which is - "of course" - reasonable, since you only would want to use such a thing for "small" loops). Current implementation supports a maximum of 18 iterations (higher values result in undefined behaviour). Can be adjusted in header file by changing the size of array _A.
Only a certain nesting depth is allowed. Current implementation supports a nesting depth of 10. Can be adjusted by redefining the macro _Y.
Explanation:
You can see the full (=de-obfuscated) source-code here. Let's say we want to allow up to 18 loops.
Retrieving upper iteration bound: The basic idea is to have an array of chars that are initially all set to 0 (this is the array counterarray). If we issue a call to e.g. 2 times {do_it;}, the macro times shall set the second element of counterarray to 1 (i.e. counterarray[2] = 1). In C, it is possible to swap index and array name in such an assignment, so we can write 2[counterarray] = 1 to acchieve the same. This is exactly what the macro times does as first step. Then, we can later scan the array counterarray until we find an element that is not 0, but 1. The corresponding index is then the upper iteration bound. It is stored in variable searcher. Since we want to support nesting, we have to store the upper bound for each nesting depth separately, this is done by searchermax[depth]=searcher+1.
Adjusting current nesting depth: As said, we want to support nesting of loops, so we have to keep track of the current nesting depth (done in the variable depth). We increment it by one if we start such a loop.
The actual counter variable: We have a "variable" called _ that implicitly gets assigned the current counter. In fact, we store one counter for each nesting depth (all stored in the array counter. Then, _ is just another macro that retrieves the proper counter for the current nesting depth from this array.
The actual for loop: We take the for loop into parts:
We initialize the counter for the current nesting depth to 0 (done by counter[depth] = 0).
The iteration step is the most complicated part: We have to check if the loop at the current nesting depth has reached its end. If so, we have do update the nesting depth accordingly. If not, we have to increment the current nesting depth's counter by 1. The variable lastloop is 1 if this is the last iteration, otherwise 0, and we adjust the current nesting depth accordingly. The main problem here is that we have to write this as a sequence of expressions, all separated by commata, which requires us to write all these conditions in a very non-straight-forward way.
The "increment step" of the for loop consists of only one assignment, that increments the appropriate counter (i.e. the element of counter of the proper nesting depth) and assigns this value to our "counter variable" _.
What about this??
void DostuffFunction(){}
for (unsigned i = 0; i < 2; ++i, DostuffFunction());
Regards,
Pablo.
What abelenky said.
And if your { // Do stuff } is multi-line, make it a function, and call that function -- twice.
Many people suggest writing out the code twice, which is fine if the code is short. There is, however, a size of code block which would be awkward to copy but is not large enough to merit its own function (especially if that function would need an excessive number of parameters). My own normal idiom to run a loop 'n' times is
i = number_of_reps;
do
{
... whatever
} while(--i);
In some measure because I'm frequently coding for an embedded system where the up-counting loop is often inefficient enough to matter, and in some measure because it's easy to see the number of repetitions. Running things twice is a bit awkward because the most efficient coding on my target system
bit rep_flag;
rep_flag = 0;
do
{
...
} while(rep_flag ^= 1); /* Note: if loop runs to completion, leaves rep_flag clear */
doesn't read terribly well. Using a numeric counter suggests the number of reps can be varied arbitrarily, which in many instances won't be the case. Still, a numeric counter is probably the best bet.
As Edsger W. Dijkstra himself put it : "two or more, use a for". No need to be any simpler.
Another attempt:
for(i=2;i--;) /* Do stuff */
This solution has many benefits:
Shortest form possible, I claim (13 chars)
Still, readable
Includes initialization
The amount of repeats ("2") is visible in the code
Can be used as a toggle (1 or 0) inside the body e.g. for alternation
Works with single instruction, instruction body or function call
Flexible (doesn't have to be used only for "doing twice")
Dijkstra compliant ;-)
From comment:
for (i=2; i--; "Do stuff");
Use function:
func();
func();
Or use macro (not recommended):
#define DO_IT_TWICE(A) A; A
DO_IT_TWICE({ x+=cos(123); func(x); })
If your compiler supports this just put the declaration inside the for statement:
for (unsigned i = 0; i < 2; ++i)
{
// Do stuff
}
This is as elegant and efficient as it can be. Modern compilers can do loop unrolling and all that stuff, trust them. If you don't trust them, check the assembler.
And it has one little advantage to all other solutions, for everybody it just reads, "do it twice".
Assuming C++0x lambda support:
template <typename T> void twice(T t)
{
t();
t();
}
twice([](){ /*insert code here*/ });
Or:
twice([]()
{
/*insert code here*/
});
Which doesn't help you since you wanted it for C.
Good rule: three or more, do a for.
I think I read that in Code Complete, but I could be wrong. So in your case you don't need a for loop.
This is the shortest possible without preprocessor/template/duplication tricks:
for(int i=2; i--; ) /*do stuff*/;
Note that the decrement happens once right at the beginning, which is why this will loop precisely twice with the indices 1 and 0 as requested.
Alternatively you can write
for(int i=2; i--; /*do stuff*/) ;
But that's purely a difference of taste.
If what you are doing is somewhat complicated wrap it in a function and call that function twice? (This depends on how many local variables your do stuff code relies on).
You could do something like
void do_stuff(int i){
// do stuff
}
do_stuff(0);
do_stuff(1);
But this may get extremely ugly if you are working on a whole bunch of local variables.
//dostuff
stuff;
//dostuff (Attention I am doing the same stuff for the :**2nd** time)
stuff;
First, use a comment
/* Do the following stuff twice */
then,
1) use the for loop
2) write the statement twice, or
3) write a function and call the function twice
do not use macros, as earlier stated, macros are evil.
(My answer's almost a triangle)
What is elegance? How do you measure it? Is someone paying you to be elegant? If so how do they determine the dollar-to-elegance conversion?
When I ask myself, "how should this be written," I consider the priorities of the person paying me. If I'm being paid to write fast code, control-c, control-v, done. If I'm being paid to write code fast, well.. same thing. If I'm being paid to write code that occupies the smallest amount of space on the screen, I short the stock of my employer.
jump instruction is pretty slow,so if you write the lines one after the other,it would work faster,than writing a loop. but modern compilers are very,very smart and the optimizations are great (if they are allowed,of course). if you have turned on your compiler's optimizations,you don't care the way,you write it - with loop or not (:
EDIT : http://en.wikipedia.org/wiki/compiler_optimizations just take a look (:
Close to your example, elegant and efficient:
for (i = 2; i; --i)
{
/* Do stuff */
}
Here's why I'd recommend that approach:
It initializes the iterator to the number of iterations, which makes intuitive sense.
It uses decrement over increment so that the loop test expression is a comparison to zero (the "i;" can be interpreted as "is i true?" which in C means "is i non-zero"), which may optimize better on certain architectures.
It uses pre-decrement as opposed to post-decrement in the counting expression for the same reason (may optimize better).
It uses a for loop instead of do/while or goto or XOR or switch or macro or any other trick approach because readability and maintainability are more elegant and important than clever hacks.
It doesn't require you to duplicate the code for "Do stuff" so that you can avoid a loop. Duplicated code is an abomination and a maintenance nightmare.
If "Do stuff" is lengthy, move it into a function and give the compiler permission to inline it if beneficial. Then call the function from within the for loop.
I like Chris Case's solution (up here), but C language doesn't have default parameters.
My solution:
bool cy = false;
do {
// Do stuff twice
} while (cy = !cy);
If you want, you could do different things in the two cycle by checking the boolean variable (maybe by ternary operator).
void loopTwice (bool first = true)
{
// Recursion is your friend
if (first) {loopTwice(false);}
// Do Stuff
...
}
I'm sure there's a more elegant way, but this is simple to read, and pretty simply to write. There might even be a way to eliminate the bool parameter, but this is what I came up with in 20 seconds.