Compiler optimization call-ret vs jmp - c

I am building one of the projects and I am looking at the generated list file.(target: x86-64) My code looks like:
int func_1(var1,var2){
asm_inline_(
)
func_2(var1,var2);
return_1;
}
void func_2(var_1,var_2){
asm __inline__(
)
func_3();
}
/**** Jump to kernel ---> System call stub in assembly. This func in .S file***/
void func_3(){
}
When I see the assembly code, I find "jmp" instruction is used instead of "call-return" pair when calling func_2 and func_3. I am sure it is one of the compiler optimization and I have not explored how to disable it. (GCC)
The moment I add some volatile variables to func_2 and func_3 and increment them then "jmp" gets replaced by "call-ret" pair.
I am bemused to see the behavior because those variables are useless and they don't serve any purpose.
Can someone please explain the behavior?
Thanks

If code jumps to the start of another function rather than calling it, when the jumped-to function returns, it will return back to the point where the outer function was called from, ignoring any more of the first function after that point. Assuming the behaviour is correct (the first function contributed nothing else to the execution after that point anyway), this is an optimisation because it reduces the number of instructions and stack manipulations by one level.
In the given example, the behaviour is correct; there's no local stack to pop and no value to return, so there is no code that needs to run after the call. (return_1, assuming it's not a macro for something, is a pure expression and therefore does nothing no matter its value.) So there's no reason to keep the stack frame around for the future when it has nothing more to contribute to events.
If you add volatile variables to the function bodies, you aren't just adding variables whose flow the compiler can analyse - you're adding slots that you've explicitly told the compiler could be accessed outside the normal control flow it can predict. The volatile qualifier warns the compiler that even though there's no obvious way for the variables to escape, something outside has a way to get their address and write to it at any time. So it can't reduce their lifetime, because it's been told that code outside the function might still try to write to that stack space; and obviously that means the stack frame needs to continue to exist for its entire declared lifespan.

Related

Is there any practical use for a function that does nothing?

Would there be any use for a function that does nothing when run, i.e:
void Nothing() {}
Note, I am not talking about a function that waits for a certain amount of time, like sleep(), just something that takes as much time as the compiler / interpreter gives it.
Such a function could be necessary as a callback function.
Supposed you had a function that looked like this:
void do_something(int param1, char *param2, void (*callback)(void))
{
// do something with param1 and param2
callback();
}
This function receives a pointer to a function which it subsequently calls. If you don't particularly need to use this callback for anything, you would pass a function that does nothing:
do_something(3, "test", Nothing);
When I've created tables that contain function pointers, I do use empty functions.
For example:
typedef int(*EventHandler_Proc_t)(int a, int b); // A function-pointer to be called to handle an event
struct
{
Event_t event_id;
EventHandler_Proc_t proc;
} EventTable[] = { // An array of Events, and Functions to be called when the event occurs
{ EventInitialize, InitializeFunction },
{ EventIncrement, IncrementFunction },
{ EventNOP, NothingFunction }, // Empty function is used here.
};
In this example table, I could put NULL in place of the NothingFunction, and check if the .proc is NULL before calling it. But I think it keeps the code simpler to put a do-nothing function in the table.
Yes. Quite a lot of things want to be given a function to notify about a certain thing happening (callbacks). A function that does nothing is a good way to say "I don't care about this."
I am not aware of any examples in the standard library, but many libraries built on top have function pointers for events.
For an example, glib defines a callback "GLib.LogFunc(log_domain, log_level, message, *user_data)" for providing the logger. An empty function would be the callback you provide when logging is disabled.
One use case would be as a possibly temporary stub function midway through a program's development.
If I'm doing some amount of top-down development, it's common for me to design some function prototypes, write the main function, and at that point, want to run the compiler to see if I have any syntax errors so far. To make that compile happen I need to implement the functions in question, which I'll do by initially just creating empty "stubs" which do nothing. Once I pass that compile test, I can go on and flesh out the functions one at a time.
The Gaddis textbook Starting out with C++: From Control Structures Through Objects, which I teach out of, describes them this way (Sec. 6.16):
A stub is a dummy function that is called instead of the actual
function it represents. It usually displays a test message
acknowledging that it was called, and nothing more.
A function that takes arguments and does nothing with them can be used as a pair with a function that does something useful, such that the arguments are still evaluated even when the no-op function is used. This can be useful in logging scenarios, where the arguments must still be evaluated to verify the expressions are legal and to ensure any important side-effects occur, but the logging itself isn't necessary. The no-op function might be selected by the preprocessor when the compile-time logging level was set at a level that doesn't want output for that particular log statement.
As I recall, there were two empty functions in Lions' Commentary on UNIX 6th Edition, with Source Code, and the introduction to the re-issue early this century called Ritchie, Kernighan and Thompson out on it.
The function that gobbles its argument and returns nothing is actually ubiquitous in C, but not written out explicitly because it is implicitly called on nearly every line. The most common use of this empty function, in traditional C, was the invisible discard of the value of any statement. But, since C89, this can be explicitly spelled as (void). The lint tool used to complain whenever a function return value was ignored without explicitly passing it to this built-in function that returns nothing. The motivation behind this was to try to prevent programmers from silently ignoring error conditions, and you will still run into some old programs that use the coding style, (void)printf("hello, world!\n");.
Such a function might be used for:
Callbacks (which the other answers have mentioned)
An argument to higher-order functions
Benchmarking a framework, with no overhead for the no-op being performed
Having a unique value of the correct type to compare other function pointers to. (Particularly in a language like C, where all function pointers are convertible and comparable with each other, but conversion between function pointers and other kinds of pointers is not portable.)
The sole element of a singleton value type, in a functional language
If passed an argument that it strictly evaluates, this could be a way to discard a return value but execute side-effects and test for exceptions
A dummy placeholder
Proving certain theorems in the typed Lambda Calculus
Another temporary use for a do-nothing function could be to have a line exist to put a breakpoint on, for example when you need to check the run-time values being passed into a newly created function so that you can make better decisions about what the code you're going to put in there will need to access. Personally, I like to use self-assignments, i.e. i = i when I need this kind of breakpoint, but a no-op function would presumably work just as well.
void MyBrandNewSpiffyFunction(TypeImNotFamiliarWith whoKnowsWhatThisVariableHas)
{
DoNothing(); // Yay! Now I can put in a breakpoint so I can see what data I'm receiving!
int i = 0;
i = i; // Another way to do nothing so I can set a breakpoint
}
From a language lawyer perspective, an opaque function call inserts a barrier for optimizations.
For example:
int a = 0;
extern void e(void);
int b(void)
{
++a;
++a;
return a;
}
int c(void)
{
++a;
e();
++a;
return a;
}
int d(void)
{
++a;
asm(" ");
++a;
return a;
}
The ++a expressions in the b function can be merged to a += 2, while in the c function, a needs to be updated before the function call and reloaded from memory after, as the compiler cannot prove that e does not access a, similar to the (non-standard) asm(" ") in the d function.
In the embedded firmware world, it could be used to add a tiny delay, required for some hardware reason. Of course, this could be called as many times in a row, too, making this delay expandable by the programmer.
Empty functions are not uncommon in platform-specific abstraction layers. There are often functions that are only needed on certain platforms. For example, a function void native_to_big_endian(struct data* d) would contain byte-swapping code on a little-endian CPU but could be completely empty on a big-endian CPU. This helps keep the business logic platform-agnostic and readable. I've also seen this sort of thing done for tasks like converting native file paths to Unix/Windows style, hardware initialization functions (when some platforms can run with defaults and others must be actively reconfigured), etc.
At the risk of being considered off-topic, I'm going to argue from a Thomistic perspective that a function that does nothing, and the concept of NULL in computing, really has no place anywhere in computing.
Software is constituted in substance by state, behavior, and control flow which belongs to behavior. To have the absence of state is impossible; and to have the absence of behavior is impossible.
Absence of state is impossible because a value is always present in memory, regardless of initialization state for the memory that is available. Absence of behavior is impossible because non-behavior cannot be executed (even "nop" instructions do something).
Instead, we might better state that there is negative and positive existence defined subjectively by the context with an objective definition being that negative existence of state or behavior means no explicit value or implementation respectively, while the positive refers to explicit value or implementation respectively.
This changes the perspective concerning the design of an API.
Instead of:
void foo(void (*bar)()) {
if (bar) { bar(); }
}
we instead have:
void foo();
void foo_with_bar(void (*bar)()) {
if (!bar) { fatal(__func__, "bar is NULL; callback required\n"); }
bar();
}
or:
void foo(bool use_bar, void (*bar)());
or if you want even more information about the existence of bar:
void foo(bool use_bar, bool bar_exists, void (*bar)());
of which each of these is a better design that makes your code and intent well-expressed. The simple fact of the matter is that the existence of a thing or not concerns the operation of an algorithm, or the manner in which state is interpreted. Not only do you lose a whole value by reserving NULL with 0 (or any arbitrary value there), but you make your model of the algorithm less perfect and even error-prone in rare cases. What more is that on a system in which this reserved value is not reserved, the implementation might not work as expected.
If you need to detect for the existence of an input, let that be explicit in your API: have a parameter or two for that if it's that important. It will be more maintainable and portable as well since you're decoupling logic metadata from inputs.
In my opinion, therefore, a function that does nothing is not practical to use, but a design flaw if part of the API, and an implementation defect if part of the implementation. NULL obviously won't disappear that easily, and we just use it because that's what currently is used by necessity, but in the future, it doesn't have to be that way.
Besides all the reasons already given here, note that an "empty" function is never truly empty, so you can learn a lot about how function calls work on your architecture of choice by looking at the assembly output. Let's look at a few examples. Let's say I have the following C file, nothing.c:
void DoNothing(void) {}
Compile this on an x86_64 machine with clang -c -S nothing.c -o nothing.s and you'll get something that looks like this (stripped of metadata and other stuff irrelevant to this discussion):
nothing.s:
_Nothing: ## #Nothing
pushq %rbp
movq %rsp, %rbp
popq %rbp
retq
Hmm, that doesn't really look like nothing. Note the pushing and popping of %rbp (the frame pointer) onto the stack. Now let's change the compiler flags and add -fomit-frame-pointer, or more explicitly: clang -c -S nothing.c -o nothing.s -fomit-frame-pointer
nothing.s:
_Nothing: ## #Nothing
retq
That looks a lot more like "nothing", but you still have at least one x86_64 instruction being executed, namely retq.
Let's try one more. Clang supports the gcc gprof profiler option -pg so what if we try that: clang -c -S nothing.c -o nothing.s -pg
nothing.s:
_Nothing: ## #Nothing
pushq %rbp
movq %rsp, %rbp
callq mcount
popq %rbp
retq
Here we've added a mysterious additional call to a function mcount() that the compiler has inserted for us. This one looks like the least amount of nothing-ness.
And so you get the idea. Compiler options and architecture can have a profound impact on the meaning of "nothing" in a function. Armed with this knowledge you can make much more informed decisions about both how you write code, and how you compile it. Moreover, a function like this called millions of times and measured can give you a very accurate measure of what you might call "function call overhead", or the bare minimum amount of time required to make a call given your architecture and compiler options. In practice given modern superscalar instruction scheduling, this measurement isn't going to mean a whole lot or be particularly useful, but on certain older or "simpler" architectures, it might.
These functions have a great place in test driven development.
class Doer {
public:
int PerformComplexTask(int input) { return 0; } // just to make it compile
};
Everything compiles and the test cases says Fail until the function is properly implemented.

Can a function know what's calling it?

Can a function tell what's calling it, through the use of memory addresses maybe? For example, function foo(); gets data on whether it is being called in main(); rather than some other function?
If so, is it possible to change the content of foo(); based on what is calling it?
Example:
int foo()
{
if (being called from main())
printf("Hello\n");
if (being called from some other function)
printf("Goodbye\n");
}
This question might be kind of out there, but is there some sort of C trickery that can make this possible?
For highly optimized C it doesn't really make sense. The harder the compiler tries to optimize the less the final executable resembles the source code (especially for link-time code generation where the old "separate compilation units" problem no longer prevents lots of optimizations). At least in theory (but often in practice for some compilers) functions that existed in the source code may not exist in the final executable (e.g. may have been inlined into their caller); functions that didn't exist in the source code may be generated (e.g. compiler detects common sequences in many functions and "out-lines" them into a new function to avoid code duplication); and functions may be replaced by data (e.g. an "int abcd(uint8_t a, uint8_t b)" replaced by a abcd_table[a][b] lookup table).
For strict C (no extensions or hacks), no. It simply can't support anything like this because it can't expect that (for any compiler including future compilers that don't exist yet) the final output/executable resembles the source code.
An implementation defined extension, or even just a hack involving inline assembly, may be "technically possible" (especially if the compiler doesn't optimize the code well). The most likely approach would be to (ab)use debugging information to determine the caller from "what the function should return to when it returns".
A better way for a compiler to support a hypothetical extension like this may be for the compiler to use some of the optimizations I mentioned - specifically, split the original foo() into 2 separate versions where one version is only ever called from main() and the other version is used for other callers. This has the bonus of letting the compiler optimize out the branches too - it could become like int foo_when_called_from_main() { printf("Hello\n"); }, which could be inlined directly into the caller, so that neither version of foo exists in the final executable. Of course if foo() had other code that's used by all callers then that common code could be lifted out into a new function rather than duplicating it (e.g. so it might become like int foo_when_called_from_main() { printf("Hello\n"); foo_common_code(); }).
There probably isn't any hypothetical compiler that works like that, but there's no real reason you can't do these same optimizations yourself (and have it work on all compilers).
Note: Yes, this was just a crafty way of suggesting that you can/should refactor the code so that it doesn't need to know which function is calling it.
Knowing who called a specific function is essentially what a stack trace is visualizing. There are no general standard way of extracting that though. In theory one could write code that targeted each system type the software would run on, and implement a stack trace function for each of them. In that case you could examine the stack and see what is before the current function.
But with all that said and done, the question you should probably ask is why? Writing a function that functions in a specific way when called from a specific function is not well isolated logic. Instead you could consider passing in a parameter to the function that caused the change in logic. That would also make the result more testable and reliable.
How to actually extract a stack trace has already received many answers here: How can one grab a stack trace in C?
I think if loop in C cannot have a condition as you have mentioned.
If you want to check whether this function is called from main(), you have to do the printf statement in the main() and also at the other function.
I don't really know what you are trying to achieve but according to what I understood, what you can do is each function will pass an additional argument that would uniquely identify that function in form of a character array, integer or enumeration.
for example:
enum function{main, add, sub, div, mul};
and call functions like:
add(3,5,main);//adds 3 and 5. called from main
changes to the code would be typical like if you are adding more functions. but it's an easier way to do it.
No. The C language does not support obtaining the name or other information of who called a function.
As all other answers show, this can only be obtained using external tools, for example that use stack traces and compiler/linker emitted symbol tables.

C: Using static volatile with "getter" function and interruptions

Suppose I have the following C code:
/* clock.c */
#include "clock.h"
static volatile uint32_t clock_ticks;
uint32_t get_clock_ticks(void)
{
return clock_ticks;
}
void clock_tick(void)
{
clock_ticks++;
}
Now I am calling clock_tick (i.e.: incrementing clock_ticks variable) within an interruption, while calling get_clock_ticks() from the main() function (i.e.: outside the interruption).
My understanding is that clock_ticks should be declared as volatile as otherwise the compiler could optimize its access and make main() think the value has not changed (while it actually changed from the interruption).
I wonder if using the get_clock_ticks(void) function there, instead of accessing the variable directly form main() (i.e.: not declaring it as static) can actually force the compiler to load the variable from memory even if it was not declared as volatile.
I wonder this as someone told me this could be happening. Is it true? Under which conditions? Should I always use volatile anyway no matters if I use a "getter" function?
A getter function doesn't help in any way here over using volatile.
Assume the compiler sees you've just fetched the value two lines above and not changed it since then.
If it's a good optimizing compiler, I would expect it to see the function call has no side effect simply optimize out the function call.
If get_clock_ticks() would be external (i.e. in a separate module), matters are different (maybe that's what you remember).
Something that can change its value outside normal program flow (e.g. in an ISR), should always be declared volatile.
Don't forget that even if you currently compile the code declaring get_clock_ticks and the code using it as separate modules, perhaps one day you will use link-time or cross-module optimisation. Keep the "volatile" even though you are using a getter function - it will do no harm to the code generation in this case, and makes the code correct.
One thing you have not mentioned is the bit size of the processor. If it is not capable of reading a 32-bit value in a single operation, then your get_clock_ticks() will sometimes fail as the reads are not atomic.

C function call with too few arguments

I am working on some legacy C code. The original code was written in the mid-90s, targeting Solaris and Sun's C compiler of that era. The current version compiles under GCC 4 (albeit with many warnings), and it seems to work, but I'm trying to tidy it up -- I want to squeeze out as many latent bugs as possible as I determine what may be necessary to adapt it to 64-bit platforms, and to compilers other than the one it was built for.
One of my main activities in this regard has been to ensure that all functions have full prototypes (which many did not have), and in that context I discovered some code that calls a function (previously un-prototyped) with fewer arguments than the function definition declares. The function implementation does use the value of the missing argument.
Example:
impl.c:
int foo(int one, int two) {
if (two) {
return one;
} else {
return one + 1;
}
}
client1.c:
extern foo();
int bar() {
/* only one argument(!): */
return foo(42);
}
client2.c:
extern int foo();
int (*foop)() = foo;
int baz() {
/* calls the same function as does bar(), but with two arguments: */
return (*foop)(17, 23);
}
Questions: is the result of a function call with missing arguments defined? If so, what value will the function receive for the unspecified argument? Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a predictable implementation-specific behavior that I can emulate by adding a particular argument value to the affected calls?
EDIT: I found a stack thread C function with no parameters behavior which gives a very succinct and specific, accurate answer. PMG's comment at the end of the answer taks about UB. Below were my original thoughts, which I think are along the same lines and explain why the behaviour is UB..
Questions: is the result of a function call with missing arguments defined?
I would say no... The reason being is that I think the function will operate as-if it had the second parameter, but as explained below, that second parameter could just be junk.
If so, what value will the function receive for the unspecified argument?
I think the values received are undefined. This is why you could have UB.
There are two general ways of parameter passing that I'm aware of... (Wikipedia has a good page on calling conventions)
Pass by register. I.e., the ABI (Application Binary Interface) for the plat form will say that registers x & y for example are for passing in parameters, and any more above that get passed via stack...
Everything gets passed via stack...
Thus when you give one module a definition of the function with "...unspecified (but not variable) number of parameters..." (the extern def), it will not place as many parameters as you give it (in this case 1) in either the registers or stack location that the real function will look in to get the parameter values. Therefore the second area for the second parameter, which is missed out, essentially contains random junk.
EDIT: Based on the other stack thread I found, I would ammended the above to say that the extern declared a function with no parameters to a declared a function with "unspecified (but not variable) number of parameters".
When the program jumps to the function, that function assumes the parameter passing mechanism has been correctly obeyed, so either looks in registers or the stack and uses whatever values it finds... asumming them to be correct.
Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a >> predictable implementation-specific behavior
You'd have to check your compiler documentation. I doubt it... the extern definition would be trusted completely so I doubt the registers or stack, depending on parameter passing mechanism, would get correctly initialised...
If the number or the types of arguments (after default argument promotions) do not match the ones used in the actual function definition, the behavior is undefined.
What will happen in practice depends on the implementation. The values of missing parameters will not be meaningfully defined (assuming the attempt to access missing arguments will not segfault), i.e. they will hold unpredictable and possibly unstable values.
Whether the program will survive such incorrect calls will also depend on the calling convention. A "classic" C calling convention, in which the caller is responsible for placing the parameters into the stack and removing them from there, will be less crash-prone in presence of such errors. The same can be said about calls that use CPU registers to pass arguments. Meanwhile, a calling convention in which the function itself is responsible for cleaning the stack will crash almost immediately.
It is very unlikely the bar function ever in the past would give consistent results. The only thing I can imagine is that it is always called on fresh stack space and the stack space was cleared upon startup of the process, in which case the second parameter would be 0. Or the difference between between returning one and one+1 didn't make a big difference in the bigger scope of the application.
If it really is like you depict in your example, then you are looking at a big fat bug. In the distant past there was a coding style where vararg functions were implemented by specifying more parameters than passed, but just as with modern varargs you should not access any parameters not actually passed.
I assume that this code was compiled and run on the Sun SPARC architecture. According to this ancient SPARC web page: "registers %o0-%o5 are used for the first six parameters passed to a procedure."
In your example with a function expecting two parameters, with the second parameter not specified at the call site, it is likely that register %01 always happened to have a sensible value when the call was made.
If you have access to the original executable and can disassemble the code around the incorrect call site, you might be able to deduce what value %o1 had when the call was made. Or you might try running the original executable on a SPARC emulator, like QEMU. In any case this won't be a trivial task!

Why doesn't gcc remove this check of a non-volatile variable?

This question is mostly academic. I ask out of curiosity, not because this poses an actual problem for me.
Consider the following incorrect C program.
#include <signal.h>
#include <stdio.h>
static int running = 1;
void handler(int u) {
running = 0;
}
int main() {
signal(SIGTERM, handler);
while (running)
;
printf("Bye!\n");
return 0;
}
This program is incorrect because the handler interrupts the program flow, so running can be modified at any time and should therefore be declared volatile. But let's say the programmer forgot that.
gcc 4.3.3, with the -O3 flag, compiles the loop body (after one initial check of the running flag) down to the infinite loop
.L7:
jmp .L7
which was to be expected.
Now we put something trivial inside the while loop, like:
while (running)
putchar('.');
And suddenly, gcc does not optimize the loop condition anymore! The loop body's assembly now looks like this (again at -O3):
.L7:
movq stdout(%rip), %rsi
movl $46, %edi
call _IO_putc
movl running(%rip), %eax
testl %eax, %eax
jne .L7
We see that running is re-loaded from memory each time through the loop; it is not even cached in a register. Apparently gcc now thinks that the value of running could have changed.
So why does gcc suddenly decide that it needs to re-check the value of running in this case?
In the general case it's difficult for a compiler to know exactly which objects a function might have access to and therefore could potentially modify. At the point where putchar() is called, GCC doesn't know if there might be a putchar() implementation that might be able to modify running so it has to be somewhat pessimistic and assume that running might in fact have been changed.
For example, there might be a putchar() implementation later in the translation unit:
int putchar( int c)
{
running = c;
return c;
}
Even if there's not a putchar() implementation in the translation unit, there could be something that might, for example, pass the address of the running object such that putchar might be able to modify it:
void foo(void)
{
set_putchar_status_location( &running);
}
Note that your handler() function is globally accessible, so putchar() might call handler() itself (directly or otherwise), which is an instance of the above situation.
On the other hand, since running is visible only to the translational unit (being static), by the time the compiler gets to the end of the file it should be able to determine that there is no opportunity for putchar() to access it (assuming that's the case), and the compiler could go back and 'fix up' the pessimization in the while loop.
Since running is static, the compiler might be able to determine that it's not accessible from outside the translation unit and make the optimization you're talking about. However, since it's accessible through handler() and handler() is accessible externally, the compiler can't optimize the access away. Even if you make handler() static, it's accessible externally since you pass the address of it to another function.
Note that in your first example, even though what I mentioned in the above paragraph is still true the compiler can optimize away the access to running because the 'abstract machine model' the C language is based on doesn't take into account asynchronous activity except in very limited circumstances (one of which is the volatile keyword and another is signal handling, though the requirements of the signal handling aren't strong enough to prevent the compiler being able to optimize away the access to running in your first example).
In fact, here's something the C99 says about the abstract machine behavior in pretty much these exact circumstances:
5.1.2.3/8 "Program execution"
EXAMPLE 1:
An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.
Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics. In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage, as well as other implementation defined restrictions.
Finally, you should note that the C99 standard also says:
7.14.1.1/5 "The signal function`
If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t...
So strictly speaking the running variable may need to be declared as:
volatile sig_atomic_t running = 1;
Because the call to putchar() could change the value of running (GCC only knows that putchar() is an external function and does not know what it does - for all GCC knows putchar() could call handler()).
GCC probably assumes that the call to putchar can modify any global variable, including running.
Take a look at the pure function attribute, which states that the function does not have side-effects on the global state. I suspect if you replace putchar() with a call to a "pure" function, GCC will reintroduce the loop optimization.
Thank you all for your answers and comments. They have been very helpful, but none of them provide the full story. [Edit: Michael Burr's answer now does, making this somewhat redundant.] I'll sum up here.
Even though running is static, handler is not static; therefore it might be called from putchar and change running in that way. Since the implementation of putchar is not known at this point, it could conceivably call handler from the body of the while loop.
Suppose handler were static. Can we optimize away the running check then? The answer is no, because the signal implementation is also outside this compilation unit. For all gcc knows, signal might store the address of handle somewhere (which, in fact, it does), and putchar might then call handler through this pointer even though it has no direct access to that function.
So in what cases can the running check be optimized away? It seems that this is only possible if the loop body does not call any functions from outside this translation unit, so that it is known at compilation time what does and does not happen inside the loop body.
This explains why forgetting a volatile is not such a big deal in practice as it might seem at first.
putchar can change running.
Only link-time analysis could, in theory, determine that it doesn't.

Resources