Regarding stack reuse of a function calling itself? - c

if a function calls itself while defining variables at the same time
would it result in stack overflow? Is there any option in gcc to reuse the same stack.
void funcnew(void)
{
int a=10;
int b=20;
funcnew();
return ;
}
can a function reuse the stack-frame which it used earlier?
What is the option in gcc to reuse the same frame in tail recursion??

Yes. See
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
Your function is compiled to:
funcstack:
.LFB0:
.cfi_startproc
xorl %eax, %eax
jmp func
.cfi_endproc
(note the jump to func)
Reusing the stack frame when a function end by a call -- this include in its full generality manipulating the stack to put the parameters at the correct place and replacing the function call by a jump to the start of the function -- is a well known optimisation called [i]tail call removal[/i]. It is mandated by some languages (scheme for instance) for recursive calls (a recursive call is the natural way to express a loop in these languages). As given above, gcc has the optimisation implemented for C, but I'm not sure which other compiler has it, I would not depend on it for portable code. And note that I don't know which restriction there are on it -- I'm not sure for instance that gcc will manipulate the stack if the parameters types are different.

Even without defining the parameters you'd get a stackoverflow. Since the return address also is pushed onto the stack.
It is (I've learned this recently) possible that the compiler optimizes your loop into a tail recursion (which makes the stack not grow at all). Link to tail recursion question on SO

No, each recursion is a new stack frame. If the recursion is infinitely deep, then the stack needed is also infinite, so you get a stack overflow.

Yes, in some cases the compiler may be able to perform something called tail call optimization. You should check with your compiler manual. (AProgrammer seems to have quoted the GCC manual in his answer.)
This is an essential optimization when implementing for example functional languages, where such code occurs frequently.

You can;t do away with the stack frame altogether, as it is needed for the return address. unless you are using tail-recursion, and your compiler has optimised it to a loop. But to be completely technically honest, you can do away with all the variables in the the frame by making them static. However, this is almost certainly not what you want to do, and you should not do it without knowing exactly what you are doing, which as you had to ask this question, you don't.

As others have noted, it is only possible if (1) your compiler supports tail call optimization, and (2) if your function is eligible for such an optimization. The optimization is to reuse the existing stack and perform a JMP (i.e., a GOTO in assembly) instead of a CALL.
In fact, your example function is indeed eligible for such an optimization. The reason is that the last thing your function does before returning is call itself; it doesn't have to do anything after the last call to funcnew(). However, only certain compilers will perform such an optimization. GCC, for instance, will do it. For more info, see Which, if any, C++ compilers do tail-recursion optimization?
The classic material on this is the factorial function. Let's make a recursive factorial function that is not eligible for tail call optimization (TCO).
int fact(int n)
{
if ( n == 1 ) return 1;
return n*fact(n-1);
}
The last thing it does is to multiply n with the result from fact(n-1). By somehow eliminating this last operation, we would be able to reuse the stack. Let's introduce an accumulator variable that will compute the answer for us:
int fact_helper(int n, int acc)
{
if ( n == 1 ) return acc;
return fact_helper(n-1, n*acc);
}
int fact_acc(int n)
{
return fact_helper(n, 1);
}
The function fact_helper does the work, while fact_acc is just a convenience function to initialize the accumulator variable.
Note how the last thing fact_helper does is to call itself. This CALL can be converted to a JMP by reusing the existing stack for the variables.
With GCC, you can verify that it is optimized to a jump by looking at the generated assembly, for instance gcc -c -O3 -Wa,-a,-ad fact.c:
...
37 L12:
38 0040 0FAFC2 imull %edx, %eax
39 0043 83EA01 subl $1, %edx
40 0046 83FA01 cmpl $1, %edx
41 0049 75F5 jne L12
...
Some programming languages, such as Scheme, will actually guarantee that proper implementations will perform such optimizations. They will even do it for non-recursive tail calls.

Related

Absolute worst case stack size based on automatic varaibles

In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
A bit of context here: I told a friend that I wrote a program not using dynamic memory allocation ("malloc") and allocate all memory static (by modeling all my state variables in a struct, which I then declared global). He then told me that if I'm using automatic variables, I still make use of dynamic memory. I argued that my automatic variables are not state variables but control variables, so my program is still to be considered static. We then discussed that there has to be a way to make a statement about the absolute worst-case behaviour about my program, so I came up with the above question.
Bonus question: If the assumptions above hold, I could simply declare all automatic variables static and would end up with a "truly" static program?
Even if array sizes are constant a C implementation could allocate arrays and even structures dynamically. I'm not aware of any that do (anyone) and it would appear quite unhelpful. But the C Standard doesn't make such guarantees.
There is also (almost certainly) some further overhead in the stack frame (the data added to the stack on call and released on return).
You would need to declare all your functions as taking no parameters and returning void to ensure no program variables in the stack. Finally the 'return address' of where execution of a function is to continue after return is pushed onto the stack (at least logically).
So having removed all parameters, automatic variables and return values to you 'state' struct there will still be something going on to the stack - probably.
I say probably because I'm aware of a (non-standard) embedded C compiler that forbids recursion that can determine the maximum size of the stack by examining the call tree of the whole program and identify the call chain that reaches the peek size of the stack.
You could achieve this a monstrous pile of goto statements (some conditional where a functon is logically called from two places or by duplicating code.
It's often important in embedded code on devices with tiny memory to avoid any dynamic memory allocation and know that any 'stack-space' will never overflow.
I'm happy this is a theoretical discussion. What you suggest is a mad way to write code and would throw away most of (ultimately limited) services C provides to infrastructure of procedural coding (pretty much the call stack)
Footnote: See the comment below about the 8-bit PIC architecture.
Bonus question: If the assumptions above hold, I could simply declare
all automatic variables static and would end up with a "truly" static
program?
No. This would change the function of the program. static variables are initialized only once.
Compare this 2 functions:
int canReturn0Or1(void)
{
static unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}
int willAlwaysReturn0(void)
{
unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}
In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
No, because of function pointers..... Read n1570.
Consider the following code, where rand(3) is some pseudo random number generator (it could also be some input from a sensor) :
typedef int foosig(int);
int foo(int x) {
foosig* fptr = (x>rand())?&foo:NULL;
if (fptr)
return (*fptr)(x);
else
return x+rand();
}
An optimizing compiler (such as some recent GCC suitably invoked with enough optimizations) would make a tail-recursive call for (*fptr)(x). Some other compiler won't.
Depending on how you compile that code, it would use a bounded stack or could produce a stack overflow. With some ABI and calling conventions, both the argument and the result could go thru a processor register and won't consume any stack space.
Experiment with a recent GCC (e.g. on Linux/x86-64, some GCC 10 in 2020) invoked as gcc -O2 -fverbose-asm -S foo.c then look inside foo.s. Change the -O2 to a -O0.
Observe that the naive recursive factorial function could be compiled into some iterative machine code with a good enough C compiler and optimizer. In practice GCC 10 on Linux compiling the below code:
int fact(int n)
{
if (n<1) return 1;
else return n*fact(n-1);
}
as gcc -O3 -fverbose-asm tmp/fact.c -S -o tmp/fact.s produces the following assembler code:
.type fact, #function
fact:
.LFB0:
.cfi_startproc
endbr64
# tmp/fact.c:3: if (n<1) return 1;
movl $1, %eax #, <retval>
testl %edi, %edi # n
jle .L1 #,
.p2align 4,,10
.p2align 3
.L2:
imull %edi, %eax # n, <retval>
subl $1, %edi #, n
jne .L2 #,
.L1:
# tmp/fact.c:5: }
ret
.cfi_endproc
.LFE0:
.size fact, .-fact
.ident "GCC: (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0"
And you can observe that the call stack is not increasing above.
If you have serious and documented arguments against GCC, please submit a bug report.
BTW, you could write your own GCC plugin which would choose to randomly apply or not such an optimization. I believe it stays conforming to the C standard.
The above optimization is essential for many compilers generating C code, such as Chicken/Scheme or Bigloo.
A related theorem is Rice's theorem. See also this draft report funded by the CHARIOT project.
See also the Compcert project.

what's the purpose of pushing address of local variables on the stack(assembly)

Let's there is a function:
int caller()
{
int arg1 = 1;
int arg2 = 2
int a = test(&arg1, &arg2)
}
test(int *a, int *b)
{
...
}
so I don't understand why &arg1 and &arg2 have to be pushed on the stack too like this
I can understand that we can get address of arg1 and arg2 in the callee by using
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
but if we don't push these two on the stack,
we can also can their address by using:
leal 8(%ebp), %edx
leal 12(%ebp), %ecx
so why bother pushing &arg1 and &arg2 on the stack?
In the general case, test has to work when you pass it arbitrary pointers, including to extern int global_var or whatever. Then main has to call it according to the ABI / calling convention.
So the asm definition of test can't assume anything about where int *a points, e.g. that it points into its caller's stack frame.
(Or you could look at that as optimizing away the addresses in a call-by-reference on locals, so the caller must place the pointed-to objects in the arg-passing slots, and on return those 2 dwords of stack memory hold the potentially-updated values of *a and *b.)
You compiled with optimization disabled. Especially for the special case where the caller is passing pointers to locals, the solution to this problem is to inline the whole function, which compilers will do when optimization is enabled.
Compilers are allowed to make a private clone of test that takes its args by value, or in registers, or with whatever custom calling convention the compiler wants to use. Most compilers don't actually do this, though, and rely on inlining instead of custom calling conventions for private functions to get rid of arg-passing overhead.
Or if it had been declared static test, then the compiler would already know it was private and could in theory use whatever custom calling convention it wanted without making a clone with a name like test.clone1234. gcc does sometimes actually do that for constant-propagation, e.g. if the caller passes a compile-time constant but gcc chooses not to inline. (Or can't because you used __attribute__((noinline)) static test() {})
And BTW, with a good register-args calling convention like x86-64 System V, the caller would do lea 12(%rsp), %rdi / lea 8(%rsp), %rsi / call test or something. The i386 System V calling convention is old and inefficient, passing everything on the stack forcing a store/reload.
You have basically identified one of the reasons that stack-args calling conventions have higher overhead and generally suck.
if you access arg1 and arg2 directly, it means you are accessing a portion of stack that does not belong to this function. This is somehow what happens when someone uses a buffer overflow attack to access additional data from calling stack.
When your call has arguments, arguments are pushed into stack(in your case &arg1 and &arg2) and function can use them as valid list of arguments for this function.

Inline and stack frame control

The following are artificial examples. Clearly compiler optimizations will dramatically change the final outcome. However, and I cannot stress this more: by temporarily disabling optimizations, I intend to have an upper bound on stack usage, likely, I expect that further compiler optimization can improve the situation.
The discussion in centered around GCC only. I would like to have fine control over how automatic variables get released from the stack. Scoping with blocks does not ensure that memory will be released when automatic variables go out of scope. Functions, as far as I know, do ensure that.
However, when inlining, what is the case? For example:
inline __attribute__((always_inline)) void foo()
{
uint8_t buffer1[100];
// Stack Size Measurement A
// Do something
}
void bar()
{
foo();
uint8_t buffer2[100];
// Stack Size Measurement B
// Do something else
}
Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?
Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?
I would like to have fine control over how automatic variables get released from the stack.
Lots of confusion here. The optimizing compiler could store some automatic variables only in registers, without using any slot in the call frame. The C language specification (n1570) does not require any call stack.
And a given register, or slot in the call frame, can be reused for different purposes (e.g. different automatic variables in different parts of the function). Register allocation is a significant role of compilers.
Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?
Certainly not. The compiler could prove that at some later point in your code, the space for buffer1 is not useful anymore so reuse that space for other purposes.
is there any way I can have fine control over stack deallocations?
No, there is not. The call stack is an implementation detail, and might not be used (or be "abused" in your point of view) by the compiler and the generated code.
For some silly example, if buffer1 is not used in foo, the compiler might not allocate space for it. And some clever compilers might just allocate 8 bytes in it, if they can prove that only 8 first bytes of buffer1 are useful.
More seriously, in some cases, GCC is able to do tail-call optimizations.
You should be interested in invoking GCC with -fstack-reuse=all, -Os,
-Wstack-usage=256, -fstack-usage, and other options.
Of course, the concrete stack usage depends upon the optimization levels. You might also inspect the generated assembler code, e.g. with -S -O2 -fverbose-asm
For example, the following code e.c:
int f(int x, int y) {
int t[100];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
when compiled with GCC8.1 on Linux/Debian/x86-64 using gcc -S -fverbose-asm -O2 e.c gives in e.s
.text
.p2align 4,,15
.globl f
.type f, #function
f:
.LFB0:
.cfi_startproc
# e.c:5: return t[0]+t[1];
leal (%rdi,%rsi), %eax #, tmp90
# e.c:6: }
ret
.cfi_endproc
.LFE0:
.size f, .-f
and you see that the stack frame is not grown by 100*4 bytes. And this is still the case with:
int f(int x, int y, int n) {
int t[n];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
which actually generates the same machine code as above. And if instead of the + above I'm calling some inline int add(int u, int v) { return u+v; } the generated code is not changing.
Be aware of the as-if rule, and of the tricky notion of undefined behavior (if n was 1 above, it is UB).
Can I always expect that at measurement B, the stack will only containbuffer2 and buffer1 has been released?
No. It's going to depend on GCC version, target, optimization level, options.
Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?
Your requirement is so specific I guess you will likely have to write yourself the code in assembler.
mov BYTE PTR [rbp-20], 1 and mov BYTE PTR [rbp-10], 2 only show the relative offset of stack pointer in stack frame. when considering run-time situation, they have the same peak stack usage.
There are two differences about whether using inline:
1) In function call mode, buffer1 will be released when exit from foo(). But in inline method, buffer1 will not be kept until exit from bar(), that means peak stack usage will last a longer time. 2) Function call will add a few overhead, such as saving stack frame information, comparing with inline mode

Why does "noreturn" function return?

I read this question about noreturn attribute, which is used for functions that don't return to the caller.
Then I have made a program in C.
#include <stdio.h>
#include <stdnoreturn.h>
noreturn void func()
{
printf("noreturn func\n");
}
int main()
{
func();
}
And generated assembly of the code using this:
.LC0:
.string "func"
func:
pushq %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
nop
popq %rbp
ret // ==> Here function return value.
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
call func
Why does function func() return after providing noreturn attribute?
The function specifiers in C are a hint to the compiler, the degree of acceptance is implementation defined.
First of all, _Noreturn function specifier (or, noreturn, using <stdnoreturn.h>) is a hint to the compiler about a theoretical promise made by the programmer that this function will never return. Based on this promise, compiler can make certain decisions, perform some optimizations for the code generation.
IIRC, if a function specified with noreturn function specifier eventually returns to its caller, either
by using and explicit return statement
by reaching end of function body
the behaviour is undefined. You MUST NOT return from the function.
To make it clear, using noreturn function specifier does not stop a function form returning to its caller. It is a promise made by the programmer to the compiler to allow it some more degree of freedom to generate optimized code.
Now, in case, you made a promise earlier and later, choose to violate this, the result is UB. Compilers are encouraged, but not required, to produce warnings when a _Noreturn function appears to be capable of returning to its caller.
According to chapter §6.7.4, C11, Paragraph 8
A function declared with a _Noreturn function specifier shall not return to its caller.
and, the paragraph 12, (Note the comments!!)
EXAMPLE 2
_Noreturn void f () {
abort(); // ok
}
_Noreturn void g (int i) { // causes undefined behavior if i <= 0
if (i > 0) abort();
}
For C++, the behaviour is quite similar. Quoting from chapter §7.6.4, C++14, paragraph 2 (emphasis mine)
If a function f is called where f was previously declared with the noreturn attribute and f eventually
returns, the behavior is undefined. [ Note: The function may terminate by throwing an exception. —end
note ]
[ Note: Implementations are encouraged to issue a warning if a function marked [[noreturn]] might
return. —end note ]
3 [ Example:
[[ noreturn ]] void f() {
throw "error"; // OK
}
[[ noreturn ]] void q(int i) { // behavior is undefined if called with an argument <= 0
if (i > 0)
throw "positive";
}
—end example ]
Why function func() return after providing noreturn attribute?
Because you wrote code that told it to.
If you don't want your function to return, call exit() or abort() or similar so it doesn't return.
What else would your function do other than return after it had called printf()?
The C Standard in 6.7.4 Function specifiers, paragraph 12 specifically includes an example of a noreturn function that can actually return - and labels the behavior as undefined:
EXAMPLE 2
_Noreturn void f () {
abort(); // ok
}
_Noreturn void g (int i) { // causes undefined behavior if i<=0
if (i > 0) abort();
}
In short, noreturn is a restriction that you place on your code - it tells the compiler "MY code won't ever return". If you violate that restriction, that's all on you.
noreturn is a promise. You're telling the compiler, "It may or may not be obvious, but I know, based on the way I wrote the code, that this function will never return." That way, the compiler can avoid setting up the mechanisms that would allow the function to return properly. Leaving out those mechanisms might allow the compiler to generate more efficient code.
How can a function not return? One example would be if it called exit() instead.
But if you promise the compiler that your function won't return, and the compiler doesn't arrange for it to be possible for the function to return properly, and then you go and write a function that does return, what's the compiler supposed to do? It basically has three possibilities:
Be "nice" to you and figure out a way to have the function return properly anyway.
Emit code that, when the function improperly returns, it crashes or behaves in arbitrarily unpredictable ways.
Give you a warning or error message pointing out that you broke your promise.
The compiler might do 1, 2, 3, or some combination.
If this sounds like undefined behavior, that's because it is.
The bottom line, in programming as in real life, is: Don't make promises you can't keep. Someone else might have made decisions based on your promise, and bad things can happen if you then break your promise.
The noreturn attribute is a promise that you make to the compiler about your function.
If you do return from such a function, behavior is undefined, but this doesn't mean a sane compiler will allow you to mess the state of the application completely by removing the ret statement, especially since the compiler will often even be able to deduce that a return is indeed possible.
However, if you write this:
noreturn void func(void)
{
printf("func\n");
}
int main(void)
{
func();
some_other_func();
}
then it's perfectly reasonable for the compiler to remove the some_other_func completely, it if feels like it.
As others have mentioned, this is classic undefined behavior. You promised func wouldn't return, but you made it return anyway. You get to pick up the pieces when that breaks.
Although the compiler compiles func in the usual manner (despite your noreturn), the noreturn affects calling functions.
You can see this in the assembly listing: the compiler has assumed, in main, that func won't return. Therefore, it literally deleted all of the code after the call func (see for yourself at https://godbolt.org/g/8hW6ZR). The assembly listing isn't truncated, it literally just ends after the call func because the compiler assumes any code after that would be unreachable. So, when func actually does return, main is going to start executing whatever crap follows the main function - be it padding, immediate constants, or a sea of 00 bytes. Again - very much undefined behavior.
This is transitive - a function that calls a noreturn function in all possible code paths can, itself, be assumed to be noreturn.
According to this
If the function declared _Noreturn returns, the behavior is undefined. A compiler diagnostic is recommended if this can be detected.
It is the programmer's responsibility to make sure that this function never returns, e.g. exit(1) at the end of the function.
ret simply means that the function returns control back to the caller. So, main does call func, the CPU executes the function, and then, with ret, the CPU continues execution of main.
Edit
So, it turns out, noreturn does not make the function not return at all, it's just a specifier that tells the compiler that the code of this function is written in such a way that the function won't return. So, what you should do here is to make sure that this function actually doesn't return control back to the callee. For example, you could call exit inside it.
Also, given what I've read about this specifier it seems that in order to make sure the function won't return to its point of invocation, one should call another noreturn function inside it and make sure that the latter is always run (in order to avoid undefined behavior) and doesn't cause UB itself.
no return function does not save the registers on the entry as it is not necessary. It makes the optimisations easier. Great for the scheduler routine for example.
See the example here:
https://godbolt.org/g/2N3THC and spot the difference
TL:DR: It's a missed-optimization by gcc.
noreturn is a promise to the compiler that the function won't return. This allows optimizations, and is useful especially in cases where it's hard for the compiler to prove that a loop won't ever exit, or otherwise prove there's no path through a function that returns.
GCC already optimizes main to fall off the end of the function if func() returns, even with the default -O0 (minimum optimization level) that it looks like you used.
The output for func() itself could be considered a missed optimization; it could just omit everything after the function call (since having the call not return is the only way the function itself can be noreturn). It's not a great example since printf is a standard C function that is known to return normally (unless you setvbuf to give stdout a buffer that will segfault?)
Lets use a different function that the compiler doesn't know about.
void ext(void);
//static
int foo;
_Noreturn void func(int *p, int a) {
ext();
*p = a; // using function args after a function call
foo = 1; // requires save/restore of registers
}
void bar() {
func(&foo, 3);
}
(Code + x86-64 asm on the Godbolt compiler explorer.)
gcc7.2 output for bar() is interesting. It inlines func(), and eliminates the foo=3 dead store, leaving just:
bar:
sub rsp, 8 ## align the stack
call ext
mov DWORD PTR foo[rip], 1
## fall off the end
Gcc still assumes that ext() is going to return, otherwise it could have just tail-called ext() with jmp ext. But gcc doesn't tailcall noreturn functions, because that loses backtrace info for things like abort(). Apparently inlining them is ok, though.
Gcc could have optimized by omitting the mov store after the call as well. If ext returns, the program is hosed, so there's no point generating any of that code. Clang does make that optimization in bar() / main().
func itself is more interesting, and a bigger missed optimization.
gcc and clang both emit nearly the same thing:
func:
push rbp # save some call-preserved regs
push rbx
mov ebp, esi # save function args for after ext()
mov rbx, rdi
sub rsp, 8 # align the stack before a call
call ext
mov DWORD PTR [rbx], ebp # *p = a;
mov DWORD PTR foo[rip], 1 # foo = 1
add rsp, 8
pop rbx # restore call-preserved regs
pop rbp
ret
This function could assume that it doesn't return, and use rbx and rbp without saving/restoring them.
Gcc for ARM32 actually does that, but still emits instructions to return otherwise cleanly. So a noreturn function that does actually return on ARM32 will break the ABI and cause hard-to-debug problems in the caller or later. (Undefined behaviour allows this, but it's at least a quality-of-implementation problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82158.)
This is a useful optimization in cases where gcc can't prove whether a function does or doesn't return. (It's obviously harmful when the function does simply return, though. Gcc warns when it's sure a noreturn function does return.) Other gcc target architectures don't do this; that's also a missed optimization.
But gcc doesn't go far enough: optimizing away the return instruction as well (or replacing it with an illegal instruction) would save code size and guarantee noisy failure instead of silent corruption.
And if you're going to optimize away the ret, optimizing away everything that's only needed if the function will return makes sense.
Thus, func() could be compiled to:
sub rsp, 8
call ext
# *p = a; and so on assumed to never happen
ud2 # optional: illegal insn instead of fall-through
Every other instruction present is a missed optimization. If ext is declared noreturn, that's exactly what we get.
Any basic block that ends with a return could be assumed to never be reached.

Can A C Compiler Eliminate This Conditional Test At Runtime?

Let's say I have pseudocode like this:
main() {
BOOL b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(true) {
do_stuff(b);
}
}
do_stuff(BOOL b) {
if(b)
path_a();
else
path_b();
}
Now, since we know that the external environment can influence get_bool_from_environment() to potentially produce either a true or false result, then we know that the code for both the true and false branches of if(b) must be included in the binary. We can't simply omit path_a(); or path_b(); from the code.
BUT -- we only set BOOL b the one time, and we always reuse the same value after program initialization.
If I were to make this valid C code and then compile it using gcc -O0, the if(b) would be repeatedly evaluated on the processor each time that do_stuff(b) is invoked, which inserts what are, in my opinion, needless instructions into the pipeline for a branch that is basically static after initialization.
If I were to assume that I actually had a compiler that was as stupid as gcc -O0, I would re-write this code to include a function pointer, and two separate functions, do_stuff_a() and do_stuff_b(), which don't perform the if(b) test, but simply go ahead and perform one of the two paths. Then, in main(), I would assign the function pointer based on the value of b, and call that function in the loop. This eliminates the branch, though it admittedly adds a memory access for the function pointer dereference (due to architecture implementation I don't think I really need to worry about that).
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()? If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I realize that compilers can't generate dynamic code at runtime, and the only types of systems that could do that in principle would be bytecode virtual machines or interpreters (e.g. Java, .NET, Ruby, etc.) -- so the question remains whether or not it is possible to do this statically and generate code that contains both the path_a(); branch and the path_b() branch, but avoid evaluating the conditional test if(b) for every call of do_stuff(b);.
If you tell your compiler to optimise, you have a good chance that the if(b) is evaluated only once.
Slightly modifying the given example, using the standard _Bool instead of BOOL, and adding the missing return types and declarations,
_Bool get_bool_from_environment(void);
void path_a(void);
void path_b(void);
void do_stuff(_Bool b) {
if(b)
path_a();
else
path_b();
}
int main(void) {
_Bool b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(1) {
do_stuff(b);
}
}
the (relevant part of the) produced assembly by clang -O3 [clang-3.0] is
callq get_bool_from_environment
cmpb $1, %al
jne .LBB1_2
.align 16, 0x90
.LBB1_1: # %do_stuff.exit.backedge.us
# =>This Inner Loop Header: Depth=1
callq path_a
jmp .LBB1_1
.align 16, 0x90
.LBB1_2: # %do_stuff.exit.backedge
# =>This Inner Loop Header: Depth=1
callq path_b
jmp .LBB1_2
b is tested only once, and main jumps into an infinite loop of either path_a or path_b depending on the value of b. If path_a and path_b are small enough, they would be inlined (I strongly expect). With -O and -O2, the code produced by clang would evaluate b in each iteration of the loop.
gcc (4.6.2) behaves similarly with -O3:
call get_bool_from_environment
testb %al, %al
jne .L8
.p2align 4,,10
.p2align 3
.L9:
call path_b
.p2align 4,,6
jmp .L9
.L8:
.p2align 4,,8
call path_a
.p2align 4,,8
call path_a
.p2align 4,,5
jmp .L8
oddly, it unrolled the loop for path_a, but not for path_b. With -O2 or -O, it would however call do_stuff in the infinite loop.
Hence to
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()?
the answer is a definitive Yes, it is possible for compilers to recognize this and take advantage of that fact. Good compilers do when asked to optimise hard.
If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I don't know the name of the optimisation, but two implementations doing that are gcc and clang (at least, recent enough releases).

Resources