gcc removes inline assembler code - c

It seems like gcc 4.6.2 removes code it considers unused from functions.
test.c
int main(void) {
goto exit;
handler:
__asm__ __volatile__("jmp 0x0");
exit:
return 0;
}
Disassembly of main()
0x08048404 <+0>: push ebp
0x08048405 <+1>: mov ebp,esp
0x08048407 <+3>: nop # <-- This is all whats left of my jmp.
0x08048408 <+4>: mov eax,0x0
0x0804840d <+9>: pop ebp
0x0804840e <+10>: ret
Compiler options
No optimizations enabled, just gcc -m32 -o test test.c (-m32 because I'm on a 64 bit machine).
How can I stop this behavior?
Edit: Preferably by using compiler options, not by modifing the code.

Looks like that's just the way it is - When gcc sees that code within a function is unreachable, it removes it. Other compilers might be different.
In gcc, an early phase in compilation is building the "control flow graph" - a graph of "basic blocks", each free of conditions, connected by branches. When emitting the actual code, parts of the graph, which are not reachable from the root, are discarded.
This isn't part of the optimization phase, and is therefore unaffected by compilation options.
So any solution would involve making gcc think that the code is reachable.
My suggestion:
Instead of putting your assembly code in an unreachable place (where GCC may remove it), you can put it in a reachable place, and skip over the problematic instruction:
int main(void) {
goto exit;
exit:
__asm__ __volatile__ (
"jmp 1f\n"
"jmp $0x0\n"
"1:\n"
);
return 0;
}
Also, see this thread about the issue.

I do not believe there is a reliable way using just compile options to solve this. The preferable mechanism is something that will do the job and work on future versions of the compiler regardless of the options used to compile.
Commentary about Accepted Answer
In the accepted answer there is an edit to the original that suggests this solution:
int main(void) {
__asm__ ("jmp exit");
handler:
__asm__ __volatile__("jmp $0x0");
exit:
return 0;
}
First off jmp $0x0 should be jmp 0x0. Secondly C labels usually get translated into local labels. jmp exit doesn't actually jump to the label exit in the C function, it jumps to the exit function in the C library effectively bypassing the return 0 at the bottom of main. Using Godbolt with GCC 4.6.4 we get this non-optimized output (I have trimmed the labels we don't care about):
main:
pushl %ebp
movl %esp, %ebp
jmp exit
jmp 0x0
.L3:
movl $0, %eax
popl %ebp
ret
.L3 is actually the local label for exit. You won't find the exit label in the generated assembly. It may compile and link if the C library is present. Do not use C local goto labels in inline assembly like this.
Use asm goto as the Solution
As of GCC 4.5 (OP is using 4.6.x) there is support for asm goto extended assembly templates. asm goto allows you to specify jump targets that the inline assembly may use:
6.45.2.7 Goto Labels
asm goto allows assembly code to jump to one or more C labels. The GotoLabels section in an asm goto statement contains a comma-separated list of all C labels to which the assembler code may jump. GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable intrinsic after the asm statement). Optimization of asm goto may be improved by using the hot and cold label attributes (see Label Attributes).
An asm goto statement cannot have outputs. This is due to an internal restriction of the compiler: control transfer instructions cannot have outputs. If the assembler code does modify anything, use the "memory" clobber to force the optimizers to flush all register values to memory and reload them if necessary after the asm statement.
Also note that an asm goto statement is always implicitly considered volatile.
To reference a label in the assembler template, prefix it with ‘%l’ (lowercase ‘L’) followed by its (zero-based) position in GotoLabels plus the number of input operands. For example, if the asm has three inputs and references two labels, refer to the first label as ‘%l3’ and the second as ‘%l4’).
Alternately, you can reference labels using the actual C label name enclosed in brackets. For example, to reference a label named carry, you can use ‘%l[carry]’. The label must still be listed in the GotoLabels section when using this approach.
The code could be written this way:
int main(void) {
__asm__ goto ("jmp %l[exit]" :::: exit);
handler:
__asm__ __volatile__("jmp 0x0");
exit:
return 0;
}
We can use asm goto. I prefer __asm__ over asm since it will not throw warnings if compiling with -ansi or -std=? options.
After the clobbers you can list the jump targets the inline assembly may use. C doesn't actually know if we jump or not as GCC doesn't analyze the actual code in the inline assembly template. It can't remove this jump, nor can it assume what comes after is dead code. Using Godbolt with GCC 4.6.4 the unoptimized code (trimmed) looks like:
main:
pushl %ebp
movl %esp, %ebp
jmp .L2 # <------ this is the goto exit
jmp 0x0
.L2: # <------ exit label
movl $0, %eax
popl %ebp
ret
The Godbolt with GCC 4.6.4 output still looks correct and appears as:
main:
jmp .L2 # <------ this is the goto exit
jmp 0x0
.L2: # <------ exit label
xorl %eax, %eax
ret
This mechanism should also work whether you have optimizations on or off, and shouldn't matter whether you are compiling for 64-bit or 32-bit x86 targets.
Other Observations
When there are no output constraints in an extended inline assembly template the asm statement is implicitly volatile. The line
__asm__ __volatile__("jmp 0x0");
Can be written as:
__asm__ ("jmp 0x0");
asm goto statements are considered implicitly volatile. They don't require a volatile modifier either.

Would this work, make it so gcc can't know its unreachable
int main(void)
{
volatile int y = 1;
if (y) goto exit;
handler:
__asm__ __volatile__("jmp 0x0");
exit:
return 0;
}

If a compiler thinks it can cheat you, just cheat back: (GCC only)
int main(void) {
{
/* Place this code anywhere in the same function, where
* control flow is known to still be active (such as at the start) */
extern volatile unsigned int some_undefined_symbol;
__asm__ __volatile__(".pushsection .discard" : : : "memory");
if (some_undefined_symbol) goto handler;
__asm__ __volatile__(".popsection" : : : "memory");
}
goto exit;
handler:
__asm__ __volatile__("jmp 0x0");
exit:
return 0;
}
This solution will not add any additional overhead for meaningless instructions, though only works for GCC when used with AS (as is the default).
Explaination: .pushsection switches text output of the compiler to another section, in this case .discard (which is deleted during linking by default). The "memory" clobber prevents GCC from trying to move other text within the section that will be discarded. However, GCC doesn't realize (and never could because the __asm__s are __volatile__) that anything happening between the 2 statements will be discarded.
As for some_undefined_symbol, that is literally just any symbol that is never being defined (or is actually defined, it shouldn't matter). And since the section of code using it will be discarded during linking, it won't produce any unresolved-reference errors either.
Finally, the conditional jump to the label you want to make appear as though it was reachable does exactly that. Besides that fact that it won't appear in the output binary at all, GCC realizes that it can't know anything about some_undefined_symbol, meaning it has no choice but to assume that both of the if's branches are reachable, meaning that as far as it is concerned, control flow can continue both by reaching goto exit, or by jumping to handler (even though there won't be any code that could even do this)
However, be careful when enabling garbage collection in your linker ld --gc-sections (it's disabled by default), because otherwise it might get the idea to get rid of the still unused label regardless.
EDIT:
Forget all that. Just do this:
int main(void) {
__asm__ __volatile__ goto("" : : : : handler);
goto exit;
handler:
__asm__ __volatile__("jmp 0x0");
exit:
return 0;
}

Update 2012/6/18
Just thinking about it, one can put the goto exit in an asm block, which means that only 1 line of code needs to change:
int main(void) {
__asm__ ("jmp exit");
handler:
__asm__ __volatile__("jmp $0x0");
exit:
return 0;
}
That is significantly cleaner than my other solution below (and possibly nicer than #ugoren's current one too).
This is pretty hacky, but it seems to work: hide the handler in a conditional that can never be followed under normal conditions, but stop it from being eliminated by stopping the compiler from being able to do its analysis properly with some inline assembler.
int main (void) {
int x = 0;
__asm__ __volatile__ ("" : "=r"(x));
// compiler can't tell what the value of x is now, but it's always 0
if (x) {
handler:
__asm__ __volatile__ ("jmp $0x0");
}
return 0;
}
Even with -O3 the jmp is preserved:
testl %eax, %eax
je .L2
.L3:
jmp $0x0
.L2:
xorl %eax, %eax
ret
(This seems really dodgy, so I hope there is a better way to do this. edit just putting a volatile in front of x works so one doesn't need to do the inline asm trickery.)

I've never heard of a way to prevent gcc from removing unreachable code; it seems that no matter what you do, once gcc detects unreachable code it always removes it (use gcc's -Wunreachable-code option to see what it considers to be unreachable).
That said, you can still put this code in a static function and it won't be optimized out:
static int func()
{
__asm__ __volatile__("jmp $0x0");
}
int main(void)
{
goto exit;
handler:
func();
exit:
return 0;
}
P.S
This solution is particularily handy if you want to avoid code redundancy when implanting the same "handler" code block in more than one place in the original code.

gcc may duplicate asm statements inside functions and remove them during optimisation (even at -O0), so this will never work reliably.
one way to do this reliably is to use a global asm statement (i.e. an asm statement outside of any function). gcc will copy this straight to the output and you can use global labels without any problems.

Related

How to create a label with the number from variable in asm

I want to write JIT compiler which will be based on the Brainfuck interpreter. The whole code of the program will be written in C. I created all instructions except loops. I have an idea to calculate offsets of matching loop brackets, but to do this I need to create the local labels in asm with the unique numbers. But each number in the name of the label should be a value from the variable. This is what I want to do in C:
void jit(struct bf_state *state, char *source)
{
size_t number_of_brackets = 0;
while (source[state->source_ptr] != '\0')
{
switch (source[state->source_ptr])
{
case '[':
{
number_of_brackets++;
__asm__ ("start_of_the_loop<number_of_brackets>:\n\t"
"pushl <number_of_brackets>\n\t"
"cmpb $0, (%%rax)\n\t"
"je <end_of_the_loop<number_of_brackets>>"
:
: "a" (state->memory_segment), "d" (number_of_brackets));
}
break;
case ']':
{
__asm__ ("end_of_the_loop<number_of_brackets>:\n\t"
"popl %%edx\n\t"
"cmpb $0, (%%rax)\n\t"
"jne <start_of_the_loop<number_of_brackets>>"
:
: "a" (state->memory_segment), "d" (number_of_brackets));
}
break;
}
}
}
Can I create the labels with the number from the variable in asm? This will help me a lot. I will be grateful for the answer. Thank you in advance!
You can't safely jump from one asm statement to another. You can use asm goto to tell the compiler you might jump to a C label instead of falling through, though.
But there's a fatal flaw with your whole idea for mixing asm and C to use the call-stack as a stack data structure: you can't leave rsp modified at the end of an asm statement. You'll break compiler-generated code that references stack memory relative to RSP, because -fomit-frame-pointer is on by default (except with -O0). And even if not, the compiler assumes it knows where RSP is pointing even in functions that do use a frame pointer.
BTW, pushl is illegal in 64-bit code, only 16 and 64-bit operand-sizes for push are available.
Also, if you're going to pop into a register, you should use an output operand for that constraint, not an input.
There's also another fatal flaw: inline-asm can't JIT. All the asm has to be there at build time. Just like C++ templates, start_of_the_loop<number_of_brackets> can't work if number_of_brackets isn't an assemble-time constant.

GCC/x86 inline asm: How do you tell gcc that inline assembly section will modify %esp?

While trying to make some old code work again (https://github.com/chaos4ever/chaos/blob/master/libraries/system/system_calls.h#L387, FWIW) I discovered that some of the semantics of gcc seem to have changed in a quite subtle but still dangerous way during the latest 10-15 years... :P
The code used to work well with older versions of gcc, like 2.95. Anyway, here is the code:
static inline return_type system_call_service_get(const char *protocol_name, service_parameter_type *service_parameter,
tag_type *identification)
{
return_type return_value;
asm volatile("pushl %2\n"
"pushl %3\n"
"pushl %4\n"
"lcall %5, $0"
: "=a" (return_value),
"=g" (*service_parameter)
: "g" (identification),
"g" (service_parameter),
"g" (protocol_name),
"n" (SYSTEM_CALL_SERVICE_GET << 3));
return return_value;
}
The problem with the code above is that gcc (4.7 in my case) will compile this to the following asm code (AT&T syntax):
# 392 "../system/system_calls.h" 1
pushl 68(%esp) # This pointer (%esp + 0x68) is valid when the inline asm is entered.
pushl %eax
pushl 48(%esp) # ...but this one is not (%esp + 0x48), since two dwords have now been pushed onto the stack, so %esp is not what the compiler expects it to be
lcall $456, $0
# Restoration of %esp at this point is done in the called method (i.e. lret $12)
The problem: The variables (identification and protocol_name) are on the stack in the calling context. So gcc (with optimizations turned out, unsure if it matters) will just get the values from there and hand it over to the inline asm section. But since I'm pushing stuff on the stack, the offsets that gcc calculate will be off by 8 in the third call (pushl 48(%esp)). :)
This took me a long time to figure out, it wasn't all obvious to me at first.
The easiest way around this is of course to use the r input constraint, to ensure that the value is in a register instead. But is there another, better way? One obvious way would of course be to rewrite the whole system call interface to not push stuff on the stack in the first place (and use registers instead, like e.g. Linux), but that's not a refactoring I feel like doing tonight...
Is there any way to tell gcc inline asm that "the stack is volatile"? How have you guys been handling stuff like this in the past?
Update later the same evening: I did found a relevant gcc ML thread (https://gcc.gnu.org/ml/gcc-help/2011-06/msg00206.html) but it didn't seem to help. It seems like specifying %esp in the clobber list should make it do offsets from %ebp instead, but it doesn't work and I suspect the -O2 -fomit-frame-pointer has an effect here. I have both of these flags enabled.
What works and what doesn't:
I tried omitting -fomit-frame-pointer. No effect whatsoever. I included %esp, esp and sp in the list of clobbers.
I tried omitting -fomit-frame-pointer and -O3. This actually produces code that works, since it relies on %ebp rather than %esp.
pushl 16(%ebp)
pushl 12(%ebp)
pushl 8(%ebp)
lcall $456, $0
I tried with just having -O3 and not -fomit-frame-pointer specified in my command line. Creates bad, broken code (relies on %esp being constant within the whole assembly block, i.e. no stack frame).
I tried with skipping -fomit-frame-pointer and just using -O2. Broken code, no stack frame.
I tried with just using -O1. Broken code, no stack frame.
I tried adding cc as clobber. No can do, doesn't make any difference whatsoever.
I tried changing the input constraints to ri, giving the input & output code below. This of course works but is slightly less elegant than I had hoped. Then again, perfect is the enemy of good so maybe I will have to live with this for now.
Input C code:
static inline return_type system_call_service_get(const char *protocol_name, service_parameter_type *service_parameter,
tag_type *identification)
{
return_type return_value;
asm volatile("pushl %2\n"
"pushl %3\n"
"pushl %4\n"
"lcall %5, $0"
: "=a" (return_value),
"=g" (*service_parameter)
: "ri" (identification),
"ri" (service_parameter),
"ri" (protocol_name),
"n" (SYSTEM_CALL_SERVICE_GET << 3));
return return_value;
}
Output asm code. As can be seen, using registers instead which should always be safe (but maybe somewhat less performant since the compiler has to move stuff around):
#APP
# 392 "../system/system_calls.h" 1
pushl %esi
pushl %eax
pushl %ebx
lcall $456, $0

GCC INLINE ASSEMBLY Won't Let Me Overwrite $esp

I'm writing code to temporarily use my own stack for experimentation. This worked when I used literal inline assembly. I was hardcoding the variable locations as offsets off of ebp. However, I wanted my code to work without haivng to hard code memory addresses into it, so I've been looking into GCC's EXTENDED INLINE ASSEMBLY. What I have is the following:
volatile intptr_t new_stack_ptr = (intptr_t) MY_STACK_POINTER;
volatile intptr_t old_stack_ptr = 0;
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
The point of this is to first save the stack pointer into the variable old_stack_ptr. Next, the stack pointer (%esp) is overwritten with the address I have saved in new_stack_ptr.
Despite this, I found that GCC was saving the %esp into old_stack_ptr, but was NOT replacing %esp with new_stack_ptr. Upon deeper inspection, I found it actually expanded my assembly and added it's own instructions, which are the following:
mov -0x14(%ebp),%eax
mov %esp,%eax
mov %eax,%esp
mov %eax,-0x18(%ebp)
I think GCC is trying to preserve the %esp, because I don't have it explicitly declared as an "output" operand... I could be totally wrong with this...
I really wanted to use extended inline assembly to do this, because if not, it seems like I have to "hard code" the location offsets off of %ebp into the assembly, and I'd rather use the variable names like this... especially because this code needs to work on a few different systems, which seem to all offset my variables differently, so using extended inline assembly allows me to explicitly say the variable location... but I don't understand why it is doing the extra stuff and not letting me overwrite the stack pointer like it was before, ever since I started using extended assembly, it's been doing this.
I appreciate any help!!!
Okay so the problem is gcc is allocating input and output to the same register eax. You want to tell gcc that you are clobbering the output before using the input, aka. "earlyclobber".
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=&r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
Notice the & sign for the output. This should fix your code.
Update: alternatively, you could force input and output to be the same register and use xchg, like so:
asm __volatile__("xchg %%esp, %0\n\t"
: "=r"(old_stack_ptr) /* output */
: "0"(new_stack_ptr) /* input */
);
Notice the "0" that says "put this into the same register as argument 0".

Swap with push / assignment / pop in GNU C inline assembly?

I was reading some answers and questions on here and kept coming up with this suggestion but I noticed no one ever actually explained "exactly" what you need to do to do it, On Windows using Intel and GCC compiler. Commented below is exactly what I am trying to do.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
//assembly code begin
/*
push x into stack; < Need Help
x=y; < With This
pop stack into y; < Please
*/
//assembly code end
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
You can't just push/pop safely from inline asm, if it's going to be portable to systems with a red-zone. That includes every non-Windows x86-64 platform. (There's no way to tell gcc you want to clobber it). Well, you could add rsp, -128 first to skip past the red-zone before pushing/popping anything, then restore it later. But then you can't use an "m" constraints, because the compiler might use RSP-relative addressing with offsets that assume RSP hasn't been modified.
But really this is a ridiculous thing to be doing in inline asm.
Here's how you use inline-asm to swap two C variables:
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm("" // no actual instructions.
: "=r"(y), "=r"(x) // request both outputs in the compiler's choice of register
: "0"(x), "1"(y) // matching constraints: request each input in the same register as the other output
);
// apparently "=m" doesn't compile: you can't use a matching constraint on a memory operand
printf("x=%d,y=%d\n",x,y);
// getchar(); // Set up your terminal not to close after the program exits if you want similar behaviour: don't embed it into your programs
return 0;
}
gcc -O3 output (targeting the x86-64 System V ABI, not Windows) from the Godbolt compiler explorer:
.section .rodata
.LC0:
.string "x=%d,y=%d"
.section .text
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov edx, 1
mov esi, 2
#APP
# 8 "/tmp/gcc-explorer-compiler116814-16347-5i3lz1/example.cpp" 1
# I used "\n" instead of just "" so we could see exactly where our inline-asm code ended up.
# 0 "" 2
#NO_APP
call printf
xor eax, eax
add rsp, 8
ret
C variables are a high level concept; it doesn't cost anything to decide that the same registers now logically hold different named variables, instead of swapping the register contents without changing the varname->register mapping.
When hand-writing asm, use comments to keep track of the current logical meaning of different registers, or parts of a vector register.
The inline-asm didn't lead to any extra instructions outside the inline-asm block either, so it's perfectly efficient in this case. Still, the compiler can't see through it, and doesn't know that the values are still 1 and 2, so further constant-propagation would be defeated. https://gcc.gnu.org/wiki/DontUseInlineAsm
#include <stdio.h>
int main()
{
int x=1;
int y=2;
printf("x::%d,y::%d\n",x,y);
__asm__( "movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y)
:"r"(x)
:"%eax"
);
printf("x::%d,y::%d\n",x,y);
return 0;
}
/* Load x to eax
Load eax to y */
If you want to exchange the values, it can also be done using this way. Please note that this instructs GCC to take care of the clobbered EAX register. For educational purposes, it is okay, but I find it more suitable to leave micro-optimizations to the compiler.
You can use extended inline assembly. It is a compiler feature whicg allows you to write assembly instructions within your C code. A good reference for inline gcc assembly is available here.
The following code copies the value of x into y using pop and push instructions.
( compiled and tested using gcc on x86_64 )
This is only safe if compiled with -mno-red-zone, or if you subtract 128 from RSP before pushing anything. It will happen to work without problems in some functions: testing with one set of surrounding code is not sufficient to verify the correctness of something you did with GNU C inline asm.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm volatile (
"pushq %%rax\n" /* Push x into the stack */
"movq %%rbx, %%rax\n" /* Copy y into x */
"popq %%rbx\n" /* Pop x into y */
: "=b"(y), "=a"(x) /* OUTPUT values */
: "a"(x), "b"(y) /* INPUT values */
: /*No need for the clobber list, since the compiler knows
which registers have been modified */
);
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
Result x=2 y=1, as you expected.
The intel compiler works in a similar way, I think you have just to change the keyword asm to __asm__. You can find info about inline assembly for the INTEL compiler here.

Can A C Compiler Eliminate This Conditional Test At Runtime?

Let's say I have pseudocode like this:
main() {
BOOL b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(true) {
do_stuff(b);
}
}
do_stuff(BOOL b) {
if(b)
path_a();
else
path_b();
}
Now, since we know that the external environment can influence get_bool_from_environment() to potentially produce either a true or false result, then we know that the code for both the true and false branches of if(b) must be included in the binary. We can't simply omit path_a(); or path_b(); from the code.
BUT -- we only set BOOL b the one time, and we always reuse the same value after program initialization.
If I were to make this valid C code and then compile it using gcc -O0, the if(b) would be repeatedly evaluated on the processor each time that do_stuff(b) is invoked, which inserts what are, in my opinion, needless instructions into the pipeline for a branch that is basically static after initialization.
If I were to assume that I actually had a compiler that was as stupid as gcc -O0, I would re-write this code to include a function pointer, and two separate functions, do_stuff_a() and do_stuff_b(), which don't perform the if(b) test, but simply go ahead and perform one of the two paths. Then, in main(), I would assign the function pointer based on the value of b, and call that function in the loop. This eliminates the branch, though it admittedly adds a memory access for the function pointer dereference (due to architecture implementation I don't think I really need to worry about that).
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()? If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I realize that compilers can't generate dynamic code at runtime, and the only types of systems that could do that in principle would be bytecode virtual machines or interpreters (e.g. Java, .NET, Ruby, etc.) -- so the question remains whether or not it is possible to do this statically and generate code that contains both the path_a(); branch and the path_b() branch, but avoid evaluating the conditional test if(b) for every call of do_stuff(b);.
If you tell your compiler to optimise, you have a good chance that the if(b) is evaluated only once.
Slightly modifying the given example, using the standard _Bool instead of BOOL, and adding the missing return types and declarations,
_Bool get_bool_from_environment(void);
void path_a(void);
void path_b(void);
void do_stuff(_Bool b) {
if(b)
path_a();
else
path_b();
}
int main(void) {
_Bool b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(1) {
do_stuff(b);
}
}
the (relevant part of the) produced assembly by clang -O3 [clang-3.0] is
callq get_bool_from_environment
cmpb $1, %al
jne .LBB1_2
.align 16, 0x90
.LBB1_1: # %do_stuff.exit.backedge.us
# =>This Inner Loop Header: Depth=1
callq path_a
jmp .LBB1_1
.align 16, 0x90
.LBB1_2: # %do_stuff.exit.backedge
# =>This Inner Loop Header: Depth=1
callq path_b
jmp .LBB1_2
b is tested only once, and main jumps into an infinite loop of either path_a or path_b depending on the value of b. If path_a and path_b are small enough, they would be inlined (I strongly expect). With -O and -O2, the code produced by clang would evaluate b in each iteration of the loop.
gcc (4.6.2) behaves similarly with -O3:
call get_bool_from_environment
testb %al, %al
jne .L8
.p2align 4,,10
.p2align 3
.L9:
call path_b
.p2align 4,,6
jmp .L9
.L8:
.p2align 4,,8
call path_a
.p2align 4,,8
call path_a
.p2align 4,,5
jmp .L8
oddly, it unrolled the loop for path_a, but not for path_b. With -O2 or -O, it would however call do_stuff in the infinite loop.
Hence to
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()?
the answer is a definitive Yes, it is possible for compilers to recognize this and take advantage of that fact. Good compilers do when asked to optimise hard.
If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I don't know the name of the optimisation, but two implementations doing that are gcc and clang (at least, recent enough releases).

Resources