From the point of view of optimizing the code run time, is there a thumb rule for where to use "nested if" statement and when to use "switch case" statements ?
I doubt you will ever find a real-life application where the difference between a nested if and a switch case is even worth measuring. Disk access, web access, etc. take many many orders of magnitude more time.
Choose what is easiest to read and debug.
Also see What is the difference between IF-ELSE and SWITCH? (possible duplicate) as well as Advantage of switch over if-else statement. Interestingly, a proponent of switch writes
In the worst case the compiler will
generate the same code as a if-else
chain, so you don't lose anything. If
in doubt put the most common cases
first into the switch statement.
In the best case the optimizer may
find a better way to generate the
code. Common things a compiler does is
to build a binary decission tree
(saves compares and jumps in the
average case) or simply build a
jump-table (works without compares at
all).
If you have more than 2-3 comparisons
then "switch"
else "if"
try to apply some patterns before you go to switch like strategy...
I don't believe it will make any difference for a decision structure that could be implemented using either method. It's highly likely that your compiler would produce the same instructions in the executable.
Related
As an example, assume that the expression sys->pot.atoms[item->P.kind].mass is evaluated inside a loop. The loop only changes item, so the expression can be simplified as atoms[item->P.kind].mass by defining a variable as atoms = sys->pot.atoms before the loop. Do modern compilers like gcc perform this kind of optimization automatically (if optimization is enabled)? And is it reliable regardless of the number of expressions like atoms[item->P.kind].mass existing inside a loop?
Yes it is a very common optimisation called Loop invariant code motion, also called hoisting or scalar promotion, often performed as a side effect of Common subexpression elimination.
It is valid to compute sys->pot.atoms just once before the loop if the compiler can ascertain that neither sys nor sys->pot.atoms can be modified inside the loop.
Note however, as commented by Groo, that if sys or sys->pot or sys->pot.atoms are specified as volatile, then it would be incorrect to compute it only once if the expression sys->pot.atoms is evaluated multiple times in the loop body or expressions.
It's a very common optimization.
And is it reliable regardless of the number of expressions
No, because optimizations is not something you can rely on happening in general. The C standard says nothing about it, so it's up to the maker of the compiler to give guarantees or not. But that's nothing you really do for the optimizer. The optimizer has a "best effort" approach, and a missed optimization is often treated like a flaw rather than an actual bug.
EDIT:
From discussion in comments, I found it useful to mention that just because a certain optimization was performed, that does not guarantee faster code. For instance, the benefit of loop unrolling is that the test in the loop does not need to be performed every iteration. But on the other hand, longer code can be less cache friendly. So asking if it's guaranteed that a certain optimization is performed or not does not really give any useful information.
I always wonder where I should do optimization myself, and where I should sit relax and leave it to the compiler.
That's very hard to know in advance. Guys like Linus Torvalds can basically see the assembly code in their head just by watching the C code, but for us mere mortals, it comes down to benchmarking and profiling.
Before even considering micro optimizations, perform these checks
Make sure that the code you're about to optimize actually is a bottleneck
Make sure you're using a good algorithm
Make sure the code is cache friendly
How does a switch statement immediately drop to the correct location in memory? With nested if-statements, it has to perform comparisons with each one, but with a switch statement it goes directly to the correct case. How is this implemented?
There are many different ways to compile a switch statement into machine code. Here are a few:
The compiler can produce a series of tests, which is not so inefficient as only about log2(N) tests are enough to dispatch a value among N possible cases.
The compiler can produce a table of values and jump addresses, which in turn will be used by generic lookup code (linear or dichotomic, similar to bsearch()) and finally jump to the corresponding location.
If the case values are dense enough, the compiler can generate a table of jump addresses and code that checks if the switch value is within a range encompassing all case values and jump directly to the corresponding address. This is probably the implementation closest to your description: but with a switch statement it goes directly to the correct case.
Depending on the specific abilities of the target CPU, compiler settings, and the number and distribution of case values, the compiler might use one of the above approaches or another, or a combination of them, or even some other methods.
Compiler designers spend a great deal of effort trying to improve heuristics for these choices. Look at the assembly output or use an online tool such as Godbolt's Compiler Explorer to see various code generation possibilities.
If I have an integer variable like int a=4 then in the switch case If i write
int b = something;
...
switch(a)
{
case 4+b: printf("hii");
}
then why is this statement a compile-time error that variables cannot be used inside a case statement why does the compiler not subtitutes the values in place of the variables.
So basically what problem it creates for which the language developers did not include it as a proper syntax.
The initial idea of the switch control-flow statement was that it should determine the appropriate case very quickly, while potentially having a lot of cases.
A traditional implementation would use a jump table, making it an O(1) operation. The jump table is essentially an array of pointers, where each pointer contains the address of the first instruction for each case. Jumping to the appropriate case is as simple as indexing that array with the switch value and then doing a jump instruction to that address.
If the cases were allowed to contain variables, the compiler would have to emit code that first evaluates these expressions and then compares the switch value against more than one other value. If that was the case, a switch statement would be just a syntactically-sugarized version of a chain of if and else if.
switch statements are usually at the heart of any algorithm which implements a finite-state machine (like parsers), so that was a good reason to include it into the language. Most modern compilers would probably generate identical machine code for a chain of if and else if which are only testing a variable against a constant, but that wasn't the case in the early 1970s when C was conceived. Moreover, switch gives you the ability to fall-through which isn't possible in the latter arrangement.
case 2+a: doSomething();
break:
case 4-a: doSomethingElse();
break;
What do you do when a==1?
There are several possible answers, including
Run all applicable cases, in order
Run all applicable cases, in arbitrary order
Run the first applicable case
Run any one applicable case
The behaviour is undefined
Raise a well-defined error
The problem is, none of the resolutions is preferred over the others. Moreover, all run contrary to the original simple rationale of the switch statement, which is providind a high(ish) level abstraction of a fast, precomputed indexed jump table.
Because it is usually superfluous, and on a compiler level you want a jump to a fixed address. Just put the dependency of the variable in the switch expression
switch(a-b)
{
case 4: printf("hii");
}
I recently found this theorem here, (at the bottom):
Any program can be transformed into a semantically equivalent program of one procedure containing one switch statement inside a while loop.
The Article went on to say :
A corollary to this theorem is that any program can be rewritten into a program consisting of a single recursive function containing only conditional statements
My questions are, are both these theorems applicable today ? Does similarly transforming a program reap any benefits ? I mean to say, is such a code optimized ? (Although recursion calls are slower, I know)
I read, from here, that switch-cases are almost always faster when optimized by the compiler. Does that make a difference. ?
PS: I'm trying to get some idea about compiler optimizations from here
And I've added the c tag as that's the only language I've seen optimized.
Its true. A Turing machine is essentially a switch statement on symbols that repeats forever, so its based pretty directly on Turing-machines-compute everything. A switch statement is just a bunch of conditionals, so you can clearly write such a program as a loop with just conditionals. Once you have that, making the loop from recursion is pretty easy although you may have to pass a lot of state variables as parameters if your language doesn't have true lexical scoping.
There's little reason to do any of this in practice. Such programs generally operate more slowly than the originals, and may take more space. So why would you possibly slow your program down, and/or make its load image bigger?
The only place this makes sense is if you intend to obfuscate the code. This kind of technique is often used as "control flow obfuscation".
This is basically what happens when a compiler translates a program into machine code. The machine code runs on a processor, which executes instructions one-by-one in a loop. The complex structure of the program has become part of the data in memory.
Recursive loops through a switch statement can be used to create a rudimentary virtual machine. If your virtual machine is Turing complete then, in theory, any program could be rewritten to work on this machine.
int opcode[] {
PUSH,
ADD
....
};
while (true) {
switch (*opcode++) {
case PUSH:
*stack++ = <var>;
break;
case ADD:
stack[-1] += stack[0];
--stack;
break;
....
}
}
Of course writing a compiler for this virtual machine would be another matter.
:-)
I have programmed an embedded software (using C of course) and now I'm considering ways to improve the running time of the system. The most important single module in my system is one very large nested for loop module.
That module consists of two nested for loops that loops max 122500 times. That's not very much yet, but the problem is that inside that nested for loop I have a function call to a function that is in another source file. That specific function consists mostly of two another nested for loops which loops always 22500 times. So now I have to make a function call 122500 times.
I have made that function that is to be called a lot lighter and shorter (yet still works as it should) and now I started to think that would it be faster to rip off that function call and write that process directly inside those first two for loops?
The processor in that system is ARM7TDMI and its frequency is 55MHz. The system itself isn't very time critical so it doesn't have to be real time capable. However the faster it can process its duties the better.
Also would it be also faster to use while loops instead of fors? And any piece of advice about how to improve the running time is appreciated.
-zaplec
TRY IT AND SEE!!
It'll almost certainly make a difference. Function call overhead isn't usually that much of an issue, but at over 100K repetitions it starts to add up.
...But whether or not it makes any real-world difference is something only you can answer, after trying it and timing the results.
As for for vs while... it shouldn't matter unless you actually change the behavior when changing the loop. If in doubt, make your compiler spit out assembler code for both and compare... or just change it and time it.
You need to be careful in the optimizations you make because you aren't always clear on which optimizations the compiler is making for you. Pre-optimization is a common mistake people make. Is it important that your code is readable and easily maintained or slightly faster? Like others have suggested, the best approach is to benchmark the different ways and see if there is a noticeable difference.
If you don't believe your compiler does much in the way of optimization I would look at some older concepts in optimizing C (searches on SO or google should provide some good links).
The ARM processor has an instruction pipeline (cache). When the processor encounters a branch (call) instruction, it must clear the pipeline and reload, thus wasting some time. One objective when optimizing for speed is to reduce the number of reloads to the instruction pipeline. This means reducing branch instructions.
As others have stated in SO, compile your code with optimization set for speed, and profile. I prefer to look at the assembly language listing as well (either printed from the compiler or displayed interwoven in the debugger). Use this as a baseline. If you can't profile, you can use assembly instruction counting as a rough estimate.
The next step is to reduce the number of branches; or the number times a branch is taken. Unrolling loops helps to reduce the number of times a branch is taken. Inlining helps reduce the number of branches. Before applying this fine-tuning techniques, review the design and code implementation to see if branches can be reduced. For example, reduce the number of "if" statements by using Boolean arithmetic or using Karnaugh Maps. My favorite is reducing requirements and eliminating code that doesn't need to be executed.
In the code implementation, move code that doesn't change outside of the for or while loops. Some loops may be reduce to equations (example, replacing a loop of additions with a multiplication). Also, reduce the quantity of iterations, by asking "does this loop really need to be executed this many times").
Another technique is to optimize for Data Oriented Design. Also check this reference.
Just remember to set a limit for optimizing. This is where you decide any more optimization is not generating any ROI or customer satisfaction. Also, apply optimizations in stages; which will allow you to have a deliverable when your manager asks for one.
Run a profiler on your code. If you are just guessing at where you are spending your time, you are probably wrong. A profiler will show what function is taking the most time and you can focus on that. You could be doing something in the function that takes longer than the function call itself. Did you look to see if you can change floating operations to integer, or integer math to shifts? You can spend a lot of time fiddling with things that don't make much difference. Run a profiler on your code and know for sure that the things you are changing will make a difference.
For function vs. inline, unfortunately there is no easy answer. I.e. it depends. See this FAQ. For "for" vs. "while", I wouldn't think there is any significant difference in performance.
In general, a function call should have more overhead than inlining. You really should profile however, as this can be affected quite a bit by your compiler (especially the compile/optimization settings). Some compilers will automatically inline code for example.