The GNU C Extensions provide a specification of label declarations, so that labels can be assigned to variables which can be used by gotos. While I acknowledge that goto makes sense in certain situations (e.g. as a substitute for exception handling in higher languages), I do not understand how this goto language extension could be justified. Can you provide a concrete example, where label values provide benefits?
The one time I used it to good effect was for threaded dispatch. Imagine an interpreter's inner loop:
while (1) {
switch ( *instruction_pointer ) {
case INSTR_A:
...
break;
case INSTR_B:
...
break;
...
}
++instruction_pointer;
}
The biggest performance problem with the looping construct itself is that there's one branch (ideally) in that swtich statement which is handling all instructions. That branch can never be properly predicted. With threaded dispatch you add explicit code to every case to go to the next:
void *instructions[] = { &&instr_a, &&instr_b, ... };
...
goto *instructions[*instruction_pointer];
instr_a:
...
goto *instructions[*++instruction_pointer];
instr_b:
...
goto *instructions[*++instruction_pointer];
Each instruction is able to jump directly to the start of the next instruction. Common sequences of instructions are faster due to CPU branch prediction. And it guarantees a jump table implementation, where the switch might not work out that way if the instruction space is slightly sparse.
Related
If I have an integer variable like int a=4 then in the switch case If i write
int b = something;
...
switch(a)
{
case 4+b: printf("hii");
}
then why is this statement a compile-time error that variables cannot be used inside a case statement why does the compiler not subtitutes the values in place of the variables.
So basically what problem it creates for which the language developers did not include it as a proper syntax.
The initial idea of the switch control-flow statement was that it should determine the appropriate case very quickly, while potentially having a lot of cases.
A traditional implementation would use a jump table, making it an O(1) operation. The jump table is essentially an array of pointers, where each pointer contains the address of the first instruction for each case. Jumping to the appropriate case is as simple as indexing that array with the switch value and then doing a jump instruction to that address.
If the cases were allowed to contain variables, the compiler would have to emit code that first evaluates these expressions and then compares the switch value against more than one other value. If that was the case, a switch statement would be just a syntactically-sugarized version of a chain of if and else if.
switch statements are usually at the heart of any algorithm which implements a finite-state machine (like parsers), so that was a good reason to include it into the language. Most modern compilers would probably generate identical machine code for a chain of if and else if which are only testing a variable against a constant, but that wasn't the case in the early 1970s when C was conceived. Moreover, switch gives you the ability to fall-through which isn't possible in the latter arrangement.
case 2+a: doSomething();
break:
case 4-a: doSomethingElse();
break;
What do you do when a==1?
There are several possible answers, including
Run all applicable cases, in order
Run all applicable cases, in arbitrary order
Run the first applicable case
Run any one applicable case
The behaviour is undefined
Raise a well-defined error
The problem is, none of the resolutions is preferred over the others. Moreover, all run contrary to the original simple rationale of the switch statement, which is providind a high(ish) level abstraction of a fast, precomputed indexed jump table.
Because it is usually superfluous, and on a compiler level you want a jump to a fixed address. Just put the dependency of the variable in the switch expression
switch(a-b)
{
case 4: printf("hii");
}
To learn more about the CPU and code optimization I have started to study Assembly programming. I have also read about clever optimizations like "branch prediction" that the CPU does to speed itself up.
My question might seem foolish since I do not know the subject very well yet.
I have a very vague memory that I have read somewhere (on the internet) that goto statements will decrease the performance of a program because it does not work well with the branch prediction in the CPU. This might however just be something that I made up and did not actually read.
I think that it could be true.
I hope this example (in pseudo-C) will clarify why I think that is so:
int function(...) {
VARIABLES DECLARED HERE
if (HERE IS A TEST) {
CODE HERE ...
} else if (ANOTHER TEST) {
CODE HERE ...
} else {
/*
Let us assume that the CPU was smart and predicted this path.
What about the jump to `label`?
Is it possible for the CPU to "pre-fetch" the instructions over there?
*/
goto label;
}
CODE HERE...
label:
CODE HERE...
}
To me it seems like a very complex task. That is because then the CPU will need to look up the place where the goto jumps to inorder to be able to pre-fetch the instructions over there.
Do you know anything about this?
Unconditional branches are not a problem for the branch predictor, because the branch predictor doesn't have to predict them.
They add a bit of complexity to the speculative instruction fetch unit, because the existence of branches (and other instructions which change the instruction pointer) means that instructions are not always fetched in linear order. Of course, this applies to conditional branches too.
Remember, branch prediction and speculative execution are different things. You don't need branch prediction for speculative execution: you can just speculatively execute code assuming that branches are never taken, and if you ever do take a branch, cancel out all the operations from beyond that branch. That would be a particularly stupid thing to do in the case of unconditional branches, but it would keep the logic nice and simple. (IIRC, this was how the first pipelined processors worked.)
(I guess you could have branch prediction without speculative execution, but there wouldn't really be a point to it, since the branch predictor wouldn't have anybody to tell its predictions to.)
So yes, branches -- both conditional and unconditional -- increase the complexity of instruction fetch units. That's okay. CPU architects are some pretty smart people.
EDIT: Back in the bad old days, it was observed that the use of goto statements could adversely affect the ability of the compilers of the day to optimize code. This might be what you were thinking of. Modern compilers are much smarter, and in general are not taken too much aback by goto.
due to 'pipelining' and similar activities,
the branch instruction could actually be placed several instructions
before the location where the actual branch is to occur.
(this is part of the branch prediction logic found in the compiler).
a goto statement is just a jump instruction.
As a side note:
Given structured programming concepts,
code clarity, readability, maintainability considerations, etc;
the 'goto' statement should never be used.
on most CPUs,
any jump/call/return type of instruction will flush the prefetch cache
then reload that cache from the new location, IF the new location
is not already in the cache.
Note: for small loops,
which will always will contain 'at least' one jump instruction,
many CPUs have an internal buffer that the programmer can exploit
to make small loops only perform one prefetch sequence
and therefore execute many orders of magnitude faster.
MISRA 14.5 says continue statement must not be used. Can anyone explain the reason?
Thank you.
It is because of the ancient debate about goto, unconditional branching and spaghetti code, that has been going on for 40 years or so. goto, continue, break and multiple return statements are all considered more or less equally bad.
The consensus of the world's programming community has roughly ended up something like: we recognize that you can use these features of the language without writing spaghetti code if you know what you are doing. But we still discourage them because there is a large chance that someone who doesn't know what they are doing are going to use the features if they are available, and then create spaghetti. And we also discourage them because they are superfluous features: you can obviously write programs without using them.
Since MISRA-C is aimed towards critical systems, MISRA-C:2004 has the approach to ban as many of these unconditional branch features as possible. Therefore, goto, continue and multiple returns were banned. break was only allowed if there was a single break inside the same loop.
However, in the "MISRA-C:2011" draft which is currently under evaluation, the committee has considered to allow all these features yet again, with a restriction that goto should only be allowed to jump downwards and never upwards. The rationale from the committee said that there are now tools (ie static analysers) smart enough to spot bad program flow, so the keywords can be allowed.
The goto debate is still going strong...
Programming in C makes it notoriously hard to keep track of multiple execution branches. If you allocate resources somewhere, you have to release them elsewhere, non-locally. If your code branches, you will in general need to have separate deallocation logic for each branch or way to exit a scope.
The continue statement adds another way to exit from the scope of a for loop, and thus makes such a loop harder to reason about and understand all the possible ways in which control can flow through it, which in turn makes it harder to ascertain that your code behaves correctly in all circumstances.
This is just speculation on my part, but I imagine that trying to limit complexity coming from this extra branching behaviour is the driving reason for the rule that you mention.
I've just run into it. We have items, which
should be checked for several things,
checks require some preparation,
we should apply cheap checks first, then go with expensive ones,
some checks depends others,
whichever item fails on any check, it should be logged,
if the item passes all the checks, it should be passed to further processing.
Watch this, without continue:
foreach (items) {
prepare check1
if (check1) {
prepare check2
if (check2) {
prepare check3
if (check3) {
log("all checks passed")
process_good_item(item)
} else {
log("check3 failed")
}
} else {
log("check2 failed")
}
} else {
log("check 1 failed")
}
}
...and compare with this, with continue:
foreach (items) {
prepare check1
if (!check1) {
log("check 1 failed")
continue
}
prepare check2
if (!check2) {
log("check 2 failed")
continue
}
prepare check3
if (!check3) {
log("check 3 failed")
continue
}
log("all checks passed")
process_good_item(item)
}
Assume that "prepare"-s are multiple line long each, so you can't see the whole code at once.
Decide yourself, which is
less complex, have a simpler execution graph
have lower cyclomatic complexity value
more readable, more linear, no "eye jumps"
better expandable (e.g. try to add check4, check5, check12)
IMHO Misra is wrong in this topic.
As with all MISRA rules, if you can justify it, you can deviate from the rule (section 4.3.2 of MISRA-C:2004)
The point behind MISRA (and other similar guidelines) is to trap the things that generally cause problems... yes, continue can be used properly, but the evidence suggested that it was a common cause of problem.
As such, MISRA created a rule to prevent its (ab)use, and the reviewing community approved the rule. And the views of the user community are generally supportive of the rule.
But I repeat, if you really want to use it, and you can justify it to your company hierarchy, deviate.
I recently found this theorem here, (at the bottom):
Any program can be transformed into a semantically equivalent program of one procedure containing one switch statement inside a while loop.
The Article went on to say :
A corollary to this theorem is that any program can be rewritten into a program consisting of a single recursive function containing only conditional statements
My questions are, are both these theorems applicable today ? Does similarly transforming a program reap any benefits ? I mean to say, is such a code optimized ? (Although recursion calls are slower, I know)
I read, from here, that switch-cases are almost always faster when optimized by the compiler. Does that make a difference. ?
PS: I'm trying to get some idea about compiler optimizations from here
And I've added the c tag as that's the only language I've seen optimized.
Its true. A Turing machine is essentially a switch statement on symbols that repeats forever, so its based pretty directly on Turing-machines-compute everything. A switch statement is just a bunch of conditionals, so you can clearly write such a program as a loop with just conditionals. Once you have that, making the loop from recursion is pretty easy although you may have to pass a lot of state variables as parameters if your language doesn't have true lexical scoping.
There's little reason to do any of this in practice. Such programs generally operate more slowly than the originals, and may take more space. So why would you possibly slow your program down, and/or make its load image bigger?
The only place this makes sense is if you intend to obfuscate the code. This kind of technique is often used as "control flow obfuscation".
This is basically what happens when a compiler translates a program into machine code. The machine code runs on a processor, which executes instructions one-by-one in a loop. The complex structure of the program has become part of the data in memory.
Recursive loops through a switch statement can be used to create a rudimentary virtual machine. If your virtual machine is Turing complete then, in theory, any program could be rewritten to work on this machine.
int opcode[] {
PUSH,
ADD
....
};
while (true) {
switch (*opcode++) {
case PUSH:
*stack++ = <var>;
break;
case ADD:
stack[-1] += stack[0];
--stack;
break;
....
}
}
Of course writing a compiler for this virtual machine would be another matter.
:-)
What is the best implementation, from a performance point of view, of branched function calls?
In the naive case we have a rather large switch statement that interprets bytecode and executes a function call depending on code.
In the normal case we have computed gotos and labels that do the same thing.
What is the absolute best way to do this?
An abstract example,
schedule:
swap_entity();
goto *entity_start();
lb_code1:
do_stuff();
goto *next_code_item();
lb_code2:
do_stuff();
goto *next_code_item();
...
Edit: My reference to "branched function calls" was perhaps somewhat erroneous. Branched code execution.
Maybe an array of function pointers, at a guess:
void dispatch(Message* message)
{
//MessageType is a finite enum
MessageType messageType = message->messageType;
int index = (int)messageType;
//there's an array element for each enum value
FunctionPointer functionPointer = arrayOfFunctionPointers[index];
(*functionPointer)(message);
}
The actual answer is hardware-dependent, and depends on things like the size of the problem and the CPU's cache.
It depends. Some table driven approach will normally be fastest, but you may well find that is what your switch statement is implemented as. Certainly, you should
not take it as read that ANY recommendation in this area
from SO users is the best. If we suggest something, you need to implement it and measure the performance in a build with all compiler optimisations turned on.
If you're looking for a speed boost here, you should look at other bytecode dispatch mechanisms. There was a question which sort-of asked that before.
Basically, you now have a goto which is probably incorrectly predicted every time, followed by a function call. With a technique like direct threading, you can probably reduce your interpreter overhead significantly. Inline threading is harder, but with greater benefit.
I gave some further resources in the other question.