Related
I have a loop with a switch inside that looks something like this (but much more complex).
for(int i = 0; i < N; i += inc)
{
v = 0.0;
switch(ConstVal)
{
case 2: v += A[i];
case 1: v += A[i+k];
case 0: v += A[i+j];
}
// do some stuff with v
}
ConstVal is unknown at compile time but fixed during the initialization routine. Is there any way to remove the switch statement without compiling multiple variants of the for loop? Given that x86 has indirect branching, there should be a simple way to inline assembly to jump to the case I want, rather than back to the top of the loop each iteration. How would you do this (in gcc)? Finally, can this be done without interfering with the compiler's optimization analysis. I'm already manually unrolling loops, but I'm sure there are lots more optimizations going on which I don't want to break.
It's my understanding that the Julia meta-programming feature gives you access to the parser and abstract syntax tree. In combination with JIT, you can resolve this kind of problem. I would think there would be some reasonable workaround in C even without semantics for indirect branch. Note that Duff's device is not a solution since I want to return to the same case statement on each loop iteration. This issue comes up frequently.
EDIT
I discovered there is no conditional indirect branch x86 instruction. Further, gcc inline assembly only allows for fixed labels. And yet, using gcc extensions, this can still be done. See, for example, https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html#Labels-as-Values.
This is how. In my code, it was difficult to determine if there was any performance difference, but on another machine or with a much smaller and simpler loop, it may make a difference.
void *forswitch;
switch(ConstVal)
{
case 2: forswitch = &&C; break;
case 1: forswitch = &&B; break;
case 0: forswitch = &&A; break;
}
void *_forswitch[] = {forswitch, &&END_FOR_SWITCH};
i = 0;
{
v = 0.0;
C: v += _A[i];
B: v += _A[i+k];
A: v += _A[i+j];
// do some stuff with v
i += inc;
forswitch = _forswitch[i==N];
//forswitch = (i<N)? forswitch: &&END_FOR_SWITCH;
goto *forswitch;
}
END_FOR_SWITCH:
return;
I've replaced the for loop with my own implementation based on the gcc extension that gives access to machine level indirect branching. There are a couple of ways to do this. The first is to index an array which jumps to the conditional start of the loop or the end of the loop depending on the loop index. The other way (commented) is to conditionally set the branch register each time. The compiler should replace any branch with a conditional move (CMOV).
There are a number of obvious issues with this solution. (1) It's not portable. (2) By implementing the loop myself, it's not only harder to understand the code but may interfere with compiler optimizations (such as automatic loop unrolling). (3) The compiler can't jointly optimize the entire switch statement even though there are no breaks because it has no knowledge at compile time of which statements will actually be executed. However, it may be able to cleverly re-organize the switch in a manner similar to how others have pointed out in some of the responses below. By manually implementing the switch myself (in combination with the for loop), I'm making it much more difficult for the compiler to make any such optimization since by removing the switch semantics, my intent is obscured by the optimizations.
Nevertheless, if it made a significant performance improvement, I still think this would be better than having multiple copies of the code. With macros, the non-portable extensions could probably be compiled conditionally; this could basically be made to look like a normal loop.
EDIT 2
I've found a much better solution which is more portable and more effective. When you have a situation where there are a small number of possible run-time determined options, you can make a wrapper around the optimized function, fix all the run-time constants, and then inline the function for each copy of constants. If there is only a single constant, you can use a lookup table of function pointers, each of which sets the constant and inlines the function. If you have a more complex situation, you'll need some if-elseif-else control structure. One of the functions can be left with all free variables so there is no loss of generality. I'm thinking of this as being a sort of compile-time closure. The compiler is doing all the hard work without any messy macros or otherwise duplicate code to maintain.
In my code, this resulted is a 10% to 20% performance increase on already significantly optimized code (due to hard-coding of various constants and not really anything to do with the switch itself). In my toy example, the change would look something like this.
inline void __foo(const int ConstVal)
{
for(int i = 0; i < N; i += inc)
{
v = 0.0;
switch(ConstVal)
{
case 2: v += A[i];
case 1: v += A[i+k];
case 0: v += A[i+j];
}
// do some stuff with v
}
}
void foo()
{
// this could be done with a lookup table
switch(ConstVal)
{
case2: __foo(2); break;
case1: __foo(1); break;
case0: __foo(0); break;
}
}
By inlining __foo, the compiler will eliminate the switch as well as any other constants that you pass along. You will, of course, end up with more compiled code, but for a small optimized routine, this shouldn't be a big deal.
No, I don't see any way to optimize the switch statement away. Besides, it is not that expensive. Because there are no break statements, the switch has a "fall thru" tendency. It translates to:
switch(ConstVal)
{
case 2: v= A[i] + A[i+k] + A[i+j]; break;
case 1: v= A[i+k] + A[i+j]; break;
case 0: v= A[i+j]; break;
}
// do some stuff with v
and I don't see how to remove the dependency on ConstVal.
You could make a switch before the loop with 3 loops, one for for each value of ConstVal but that will surely look like ugly code, depending on what do some stuff with v does.
When do you know ConstVal, and how often does it change? If you can recompile a small routine and re-link the whole thing whenever ConstVal changes, that will solve your problem.
Having said that, do you know this is a problem?
Is that switch responsible for 10% or more of execution time?
It is very common for people to focus on non-problems.
They know they should "profile first" but they don't actually do it.
(It's a case of "ready-fire-aim" :)
A method many people rely on is this.
Is there any way to remove the switch statement without compiling multiple variants of the for loop?
In general it's better either to make different variants or just leave as is. But in this case maybe we can invent some trick. Kind of this
switch (constVal) {
case 2:
A1 = A;
B1 = A + k;
C1 = A + j;
break;
case 1:
A1 = big_zero_array;
B1 = A + k;
C1 = A + j;
break;
case 0:
A1 = B1 = big_zero_array;
C1 = A + j
break;
}
for (int i = 0; i < N; i += inc) {
v = A1[i] + B1[i] + C1[i];
//....
}
Still this thing requires additional memory and may be even slower under some circumstances.
I'm trying to learn how to optimize code (I'm also learning C), and in one of my books there's a problem for optimizing Horner's method for evaluation polynomials. I'm a little lost on how to approach the problem. I'm not great at recognizing what needs optimizing.
Any advice on how to make this function run faster would be appreciated.
Thanks
double polyh(double a[], double x, int degree) {
long int i;
double result = a[degree];
for (i = degree-1; i >= 0; i--)
result = a[i] + x*result;
return result;
}
You really need to profile your code to test whether proposed optimizations really help. For example, it may be the case that declaring i as long int rather than int slows the function on your machine, but on the other hand it may make no difference on your machine but might make a difference on others, etc. Anyway, there's no reason to declare i a long int when degree is an int, so changing it probably won't hurt. (But still profile!)
Horner's rule is supposedly optimal in terms of the number of multiplies and adds required to evaluate a polynomial, so I don't see much you can do with it. One thing that might help (profile!) is changing the test i>=0 to i!=0. Of course, then the loop doesn't run enough times, so you'll have to add a line below the loop to take care of the final case.
Alternatively you could use a do { ... } while (--i) construct. (Or is it do { ... } while (i--)? You figure it out.)
You might not even need i, but using degree instead will likely not save an observable amount of time and will make the code harder to debug, so it's not worth it.
Another thing that might help (I doubt it, but profile!) is breaking up the arithmetic expression inside the loop and playing around with order, like
for (...) {
result *= x;
result += a[i];
}
which may reduce the need for temporary variables/registers. Try it out.
Some suggestion:
You may use int instead of long int for looping index.
Almost certainly the problem is inviting you to conjecture on the values of a. If that vector is mostly zeros, then you'll go faster (by doing fewer double multiplications, which will be the clear bottleneck on most machines) by computing only the values of a[i] * x^i for a[i] != 0. In turn the x^i values can be computed by careful repeated squaring, preserving intermediate terms so that you never compute the same partial power more than once. See the Wikipedia article if you've never implemented repeated squaring.
This is a function to get sum of the digits of a number:
int sumOfDigits(int n)
{
int sum=0; //line 1
if(n==0)
return sum;
else
{
sum=(n%10)+sumOfDigits(n/10); //line 2
// return sum; //line 3
}
}
While writing this code, I realized the scope of the local variables is local to each individual recursion of the function. So am I right in saying that if n=11111, 5 sum variables are created and pushed on the stack with each recursion? If this is correct then what is the benefit of using recursion when I can do it in normal function using loops, thus overwriting only one memory location? If I use pointers, recursion will probably take similar memory as a normal function.
Now my second question, even though this function gives me the correct result each time, I don't see how the recursions (other than the last one which returns 0) return values without uncommenting line 3. (using geany with gcc)
I'm new to programming, so please pardon any mistakes
So am I right in saying that if n=11111, 5 sum variables are created and pushed on the stack with each recursion?
Conceptually, but compilers may turn some forms of recursion into jumps/loops. E.g. a compiler that does tail call optimization may turn
void rec(int i)
{
if (i > 0) {
printf("Hello, level %d!\n", i);
rec(i - 1);
}
}
into the equivalent of
void loop(int i)
{
for (; i > 0; i--)
printf("Hello, level %d!\n", i);
}
because the recursive call is in tail position: when the call is made, the current invocation of rec has no more work to do except a return to its caller, so it might as well reuse its stack frame for the next recursive call.
If this is correct then what is the benefit of using recursion when I can do it in normal function using loops, thus overwriting only one memory location? If I use pointers, recursion will probably take similar memory as a normal function.
For this problem, recursion is a pretty bad fit, at least in C, because a loop is much more readable. There are problems, however, where recursion is easier to understand. Algorithms on tree structures are the prime example.
(Although every recursion can be emulated by a loop with an explicit stack, and then stack overflows can be more easily caught and handled.)
I don't understand the remark about pointers.
I don't see how the recursions (other than the last one which returns 0) return values without uncommenting line 3.
By chance. The program exhibits undefined behavior, so it may do anything, even return the correct answer.
So am I right in saying that if n=11111, 5 sum variables are created
and pushed on the stack with each recursion?
The recursion is 5 levels deep, so traditionally 5 stack frames will be eventually created (but read below!), each one of which will have space to hold a sum variable. So this is mostly correct in spirit.
If this is correct then what is the benefit of using recursion when I
can do it in normal function using loops, thus overwriting only one
memory location?
There are several reasons, which include:
it might be more natural to express an algorithm recursively; if the performance is acceptable, maintainability counts for a lot
simple recursive solutions typically do not keep state, which means they are trivially parallelizable, which is a major advantage in the multicore era
compiler optimizations frequently negate the drawbacks of recursion
I don't see how the recursions (other than the last one which returns
0) return values without uncommenting line 3.
It's undefined behavior to comment out line 3. Why would you do that?
Yes, the parameters and local variables are local to each invokation and this is usually achieved by creating a copy of each invokation variables set on the program stack. Yes, that consumes more memory compared to an implementation with a loop, but only if the problem can be solved with a loop and constant memory usage. Consider traversing a tree - you will have to store the tree elements somewhere - be it on the stack or in some other structure. Recursion advantage is it is easier to implement (but not always easier to debug).
If you comment return sum; in the second branch the behavior is undefined - anything can happen, expected behavior included. That's not what you should do.
Many times I need to do things TWICE in a for loop. Simply I can set up a for loop with an iterator and go through it twice:
for (i = 0; i < 2; i++)
{
// Do stuff
}
Now I am interested in doing this as SIMPLY as I can, perhaps without an initializer or iterator? Are there any other, really simple and elegant, ways of achieving this?
This is elegant because it looks like a triangle; and triangles are elegant.
i = 0;
here: dostuff();
i++; if ( i == 1 ) goto here;
Encapsulate it in a function and call it twice.
void do_stuff() {
// Do Stuff
}
// .....
do_stuff();
do_stuff();
Note: if you use variables or parameters of the enclosing function in the stuff logic, you can pass them as arguments to the extracted do_stuff function.
If its only twice, and you want to avoid a loop, just write the darn thing twice.
statement1;
statement1; // (again)
If the loop is too verbose for you, you can also define an alias for it:
#define TWICE for (int _index = 0; _index < 2; _index++)
This would result into that code:
TWICE {
// Do Stuff
}
// or
TWICE
func();
I would only recommend to use this macro if you have to do this very often, I think else the plain for-loop is more readable.
Unfortunately, this is not for C, but for C++ only, but does exactly what you want:
Just include the header, and you can write something like this:
10 times {
// Do stuff
}
I'll try to rewrite it for C as well.
So, after some time, here's an approach that enables you to write the following in pure C:
2 times {
do_something()
}
Example:
You'll have to include this little thing as a simple header file (I always called the file extension.h). Then, you'll be able to write programs in the style of:
#include<stdio.h>
#include"extension.h"
int main(int argc, char** argv){
3 times printf("Hello.\n");
3 times printf("Score: 0 : %d\n", _);
2 times {
printf("Counting: ");
9 times printf("%d ", _);
printf("\n");
}
5 times {
printf("Counting up to %d: ", _);
_ times printf("%d ", _);
printf("\n");
}
return 0;
}
Features:
Simple notation of simple loops (in the style depicted above)
Counter is implicitly stored in a variable called _ (a simple underscore).
Nesting of loops allowed.
Restrictions (and how to (partially) circumvent them):
Works only for a certain number of loops (which is - "of course" - reasonable, since you only would want to use such a thing for "small" loops). Current implementation supports a maximum of 18 iterations (higher values result in undefined behaviour). Can be adjusted in header file by changing the size of array _A.
Only a certain nesting depth is allowed. Current implementation supports a nesting depth of 10. Can be adjusted by redefining the macro _Y.
Explanation:
You can see the full (=de-obfuscated) source-code here. Let's say we want to allow up to 18 loops.
Retrieving upper iteration bound: The basic idea is to have an array of chars that are initially all set to 0 (this is the array counterarray). If we issue a call to e.g. 2 times {do_it;}, the macro times shall set the second element of counterarray to 1 (i.e. counterarray[2] = 1). In C, it is possible to swap index and array name in such an assignment, so we can write 2[counterarray] = 1 to acchieve the same. This is exactly what the macro times does as first step. Then, we can later scan the array counterarray until we find an element that is not 0, but 1. The corresponding index is then the upper iteration bound. It is stored in variable searcher. Since we want to support nesting, we have to store the upper bound for each nesting depth separately, this is done by searchermax[depth]=searcher+1.
Adjusting current nesting depth: As said, we want to support nesting of loops, so we have to keep track of the current nesting depth (done in the variable depth). We increment it by one if we start such a loop.
The actual counter variable: We have a "variable" called _ that implicitly gets assigned the current counter. In fact, we store one counter for each nesting depth (all stored in the array counter. Then, _ is just another macro that retrieves the proper counter for the current nesting depth from this array.
The actual for loop: We take the for loop into parts:
We initialize the counter for the current nesting depth to 0 (done by counter[depth] = 0).
The iteration step is the most complicated part: We have to check if the loop at the current nesting depth has reached its end. If so, we have do update the nesting depth accordingly. If not, we have to increment the current nesting depth's counter by 1. The variable lastloop is 1 if this is the last iteration, otherwise 0, and we adjust the current nesting depth accordingly. The main problem here is that we have to write this as a sequence of expressions, all separated by commata, which requires us to write all these conditions in a very non-straight-forward way.
The "increment step" of the for loop consists of only one assignment, that increments the appropriate counter (i.e. the element of counter of the proper nesting depth) and assigns this value to our "counter variable" _.
What about this??
void DostuffFunction(){}
for (unsigned i = 0; i < 2; ++i, DostuffFunction());
Regards,
Pablo.
What abelenky said.
And if your { // Do stuff } is multi-line, make it a function, and call that function -- twice.
Many people suggest writing out the code twice, which is fine if the code is short. There is, however, a size of code block which would be awkward to copy but is not large enough to merit its own function (especially if that function would need an excessive number of parameters). My own normal idiom to run a loop 'n' times is
i = number_of_reps;
do
{
... whatever
} while(--i);
In some measure because I'm frequently coding for an embedded system where the up-counting loop is often inefficient enough to matter, and in some measure because it's easy to see the number of repetitions. Running things twice is a bit awkward because the most efficient coding on my target system
bit rep_flag;
rep_flag = 0;
do
{
...
} while(rep_flag ^= 1); /* Note: if loop runs to completion, leaves rep_flag clear */
doesn't read terribly well. Using a numeric counter suggests the number of reps can be varied arbitrarily, which in many instances won't be the case. Still, a numeric counter is probably the best bet.
As Edsger W. Dijkstra himself put it : "two or more, use a for". No need to be any simpler.
Another attempt:
for(i=2;i--;) /* Do stuff */
This solution has many benefits:
Shortest form possible, I claim (13 chars)
Still, readable
Includes initialization
The amount of repeats ("2") is visible in the code
Can be used as a toggle (1 or 0) inside the body e.g. for alternation
Works with single instruction, instruction body or function call
Flexible (doesn't have to be used only for "doing twice")
Dijkstra compliant ;-)
From comment:
for (i=2; i--; "Do stuff");
Use function:
func();
func();
Or use macro (not recommended):
#define DO_IT_TWICE(A) A; A
DO_IT_TWICE({ x+=cos(123); func(x); })
If your compiler supports this just put the declaration inside the for statement:
for (unsigned i = 0; i < 2; ++i)
{
// Do stuff
}
This is as elegant and efficient as it can be. Modern compilers can do loop unrolling and all that stuff, trust them. If you don't trust them, check the assembler.
And it has one little advantage to all other solutions, for everybody it just reads, "do it twice".
Assuming C++0x lambda support:
template <typename T> void twice(T t)
{
t();
t();
}
twice([](){ /*insert code here*/ });
Or:
twice([]()
{
/*insert code here*/
});
Which doesn't help you since you wanted it for C.
Good rule: three or more, do a for.
I think I read that in Code Complete, but I could be wrong. So in your case you don't need a for loop.
This is the shortest possible without preprocessor/template/duplication tricks:
for(int i=2; i--; ) /*do stuff*/;
Note that the decrement happens once right at the beginning, which is why this will loop precisely twice with the indices 1 and 0 as requested.
Alternatively you can write
for(int i=2; i--; /*do stuff*/) ;
But that's purely a difference of taste.
If what you are doing is somewhat complicated wrap it in a function and call that function twice? (This depends on how many local variables your do stuff code relies on).
You could do something like
void do_stuff(int i){
// do stuff
}
do_stuff(0);
do_stuff(1);
But this may get extremely ugly if you are working on a whole bunch of local variables.
//dostuff
stuff;
//dostuff (Attention I am doing the same stuff for the :**2nd** time)
stuff;
First, use a comment
/* Do the following stuff twice */
then,
1) use the for loop
2) write the statement twice, or
3) write a function and call the function twice
do not use macros, as earlier stated, macros are evil.
(My answer's almost a triangle)
What is elegance? How do you measure it? Is someone paying you to be elegant? If so how do they determine the dollar-to-elegance conversion?
When I ask myself, "how should this be written," I consider the priorities of the person paying me. If I'm being paid to write fast code, control-c, control-v, done. If I'm being paid to write code fast, well.. same thing. If I'm being paid to write code that occupies the smallest amount of space on the screen, I short the stock of my employer.
jump instruction is pretty slow,so if you write the lines one after the other,it would work faster,than writing a loop. but modern compilers are very,very smart and the optimizations are great (if they are allowed,of course). if you have turned on your compiler's optimizations,you don't care the way,you write it - with loop or not (:
EDIT : http://en.wikipedia.org/wiki/compiler_optimizations just take a look (:
Close to your example, elegant and efficient:
for (i = 2; i; --i)
{
/* Do stuff */
}
Here's why I'd recommend that approach:
It initializes the iterator to the number of iterations, which makes intuitive sense.
It uses decrement over increment so that the loop test expression is a comparison to zero (the "i;" can be interpreted as "is i true?" which in C means "is i non-zero"), which may optimize better on certain architectures.
It uses pre-decrement as opposed to post-decrement in the counting expression for the same reason (may optimize better).
It uses a for loop instead of do/while or goto or XOR or switch or macro or any other trick approach because readability and maintainability are more elegant and important than clever hacks.
It doesn't require you to duplicate the code for "Do stuff" so that you can avoid a loop. Duplicated code is an abomination and a maintenance nightmare.
If "Do stuff" is lengthy, move it into a function and give the compiler permission to inline it if beneficial. Then call the function from within the for loop.
I like Chris Case's solution (up here), but C language doesn't have default parameters.
My solution:
bool cy = false;
do {
// Do stuff twice
} while (cy = !cy);
If you want, you could do different things in the two cycle by checking the boolean variable (maybe by ternary operator).
void loopTwice (bool first = true)
{
// Recursion is your friend
if (first) {loopTwice(false);}
// Do Stuff
...
}
I'm sure there's a more elegant way, but this is simple to read, and pretty simply to write. There might even be a way to eliminate the bool parameter, but this is what I came up with in 20 seconds.
what is the alternate way of doing function of switch-case (and if-else) in c?
Function pointers are one alternative. Consider the following snippet that calls a function through a function pointer array:
#include <stdio.h>
void fn0(int n) { printf ("fn0, n = %d\n",n); }
void fn1(int n) { printf ("fn1, n = %d\n",n); }
void fn2(int n) { printf ("fn2, n = %d\n",n); }
void fn3(int n) { printf ("fn3, n = %d\n",n); }
static void (*fn[])(int) = {fn0, fn1, fn2, fn3};
int main(void) {
int i;
for (i = 0; i < 4; i++)
fn[i](10-i);
return 0;
}
This generates:
fn0, n = 10
fn1, n = 9
fn2, n = 8
fn3, n = 7
This sort of construct makes it very easy to implement things such as finite state machines where, instead of a massive switch statement or near-unmanageable nested if's, you can just use an integer state variable to index into an array of function pointers.
You could always use gotos... :-p
Function pointers and a semi implementation of the strategy pattern :)
.. though youll need some logic to determine which function to call
There are several different ways to handle conditional branch-and-switch scenarios in C.
The typical patterns, which you yourself mention, are switch( ) statements and if/else if/else groups. However, sometimes these flow control constructs are not the best choice for certain problems. Specifically cases such as:
High performance branching over a large domain
Branching on value domains only known at runtime
Changing the branch paths at runtime based on other conditions
In these cases, there are two patterns that I find helpful:
The Strategy pattern with a direct dispatch
The Strategy pattern with a chained dispatch
In the first approach, you map each value from your domain to a collection of function pointers. Each function handles a particular case (value) from your domain. This allows you to "jump" directly to the right handler for a particular case. This pattern works well when each case is separated from all the others and there is little or no overlapping logic.
In the second approach, you chain all of the dispatch methods together - and call each of them for all cases. Each dispatched method decides if it handles the case or not, and either returns immediately or performs some processing. This pattern is useful when there is overlap between the responsibilities of some of the handlers. It is somewhat less performant, since multiple handlers are invoked, and each decides whether it needs to perform its processing. However, this is one of the easier ways to deal with overlapping logic - the kind you could normally handle in a switch() statement with fall through (or jump) conditions.
You should only use one of these techniques if the problem really requires it, since they are less obvious to future developers and can introduce unnecessary complexity and maintenance problems if implemented poorly. It also makes your code more difficult to understand, over more common constructs like switch or if/else.