How consistently C compilers optimize unreachable code? - c

Say you have (for reasons that are not important here) the following code:
int k = 0;
... /* no change to k can happen here */
if (k) {
do_something();
}
Using the -O2 flag, GCC will not generate any code for it, recognizing that the if test is always false.
I'm wondering if this is a pretty common behaviour across compilers or it is something I should not rely on.
Does anybody knows?

Dead code elimination in this case is trivial to do for any modern optimizing compiler. I would definitely rely on it, given that optimizations are turned on and you are absolutely sure that the compiler can prove that the value is zero at the moment of check.
However, you should be aware that sometimes your code has more potential side effects than you think.
The first source of problems is calling non-inlined functions. Whenever you call a function which is not inlined (i.e. because its definition is located in another translation unit), compiler assumes that all global variables and the whole contents of the heap may change inside this call. Local variables are the lucky exception, because compiler knows that it is illegal to modify them indirectly... unless you save the address of a local variable somewhere. For instance, in this case dead code won't be eliminated:
int function_with_unpredictable_side_effects(const int &x);
void doit() {
int k = 0;
function_with_unpredictable_side_effects(k);
if (k)
printf("Never reached\n");
}
So compiler has to do some work and may fail even for local variables. By the way, I believe the problem which is solved in this case is called escape analysis.
The second source of problems is pointer aliasing: compiler has to take into account that all sort of pointers and references in your code may be equal, so changing something via one pointer may change the contents at the other one. Here is one example:
struct MyArray {
int num;
int arr[100];
};
void doit(int idx) {
MyArray x;
x.num = 0;
x.arr[idx] = 7;
if (x.num)
printf("Never reached\n");
}
Visual C++ compiler does not eliminate the dead code, because it thinks that you may access x.num as x.arr[-1]. It may sound like an awful thing to do to you, but this compiler has been used in gamedev area for years, and such hacks are not uncommon there, so the compiler stays on the safe side. On the other hand, GCC removes the dead code. Maybe it is related to its exploitation of strict pointer aliasing rule.
P.S. The const keywork is never used by optimizer, it is only present in C/C++ language for programmers' convenience.

There is no pretty common behaviour across compilers. But there is a way to explore how different compilers acts with specific part of code.
Compiler explorer will help you to answer on every question about code generation, but of course you must be familiar with assembler language.

Related

GCC Optimization and debugging

I don't understand exactly the following:
When using debugging and optimization together, the internal rearrangements carried out by the optimizer can make it difficult to see what is going on when examining an optimized program in the debugger. For example, the ordering of statements may be changed.
What i understand is when i build a program with the -g option, then the executable will contain a symbolic table which contains variable, function names, references to them and their line-numbers.
And when i build with an optimization option, for example the ordering of instructions may be changed depends on the optimization.
What i don't understand is, why debugging is more difficult.
I would like to see an example, and an easy to understand explanation.
An example that might happen:
int calc(int a, int b)
{
return a << b + 7;
}
int main()
{
int x = 5;
int y = 7;
int val = calc(x, y);
return val;
}
Optimized this might be the same as
int main()
{
return 642;
}
A contrived example, but trying to debug that kind of optimization in actual code isn’t simple. Some debuggers may show all lines of code marked when stepping through, some might skip them all, some may be confused. And the developer at least is.
simple example:
int a = 4;
int b = a;
int c = b;
printf("%d", c);
can be optimized as:
printf("%d", 4);
In fact in optimized compiles, the compiler might well do exactly this (in machine code of course)
When debugging we the debugger will allow us to inspect the memory associated by a,b and c but when the top version get optimized into the bottom version a,b and c no longer exist in RAM. This makes inspecting RAM a lot harder to figure out what is going on.
When you compile using the optimization flag you are ensured that the output of the program will be compliant to the code you wrote, but the code itself will variate from the one you actually compiled.
As you pointed out, the code will be rearranged and some call will be performed differently. Also another optimization could be loop unrolling, branch prediction and functions calls simplification. These optimizations will also vary on the architecture you are running on.
For all these reasons (and others) your code may become very difficult to debug, since it is transparent for you what the compiler exactly does, thus meaning that the code you want to debug may not look like the one you wrote.

Return a struct directly or fill a pointer?

Let's say I have the following function to initialize a data structure:
void init_data(struct data *d) {
d->x = 5;
d->y = 10;
}
However, with the following code:
struct data init_data(void) {
struct data d = { 5, 10 };
return d;
}
Wouldn't this be optimized away due to copy elision and be just as performant as the former version?
I tried to do some tests on godbolt to see if the assembly was the same, but when using any optimization flags everything was always entirely optimized away, with nothing left but something like this: movabsq $42949672965, %rax, and I am not sure if the same would happen in real code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with RVO, with the latter requiring less code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with
RVO, with the latter requiring less code.
The main reason for the first being so common is historic. The second way of initializing structures from literals was not standard (well, it was, but only for static initializers and never for automatic variables) and it's never allowed on assignments (well, I've not checked the status of the recent standards) Even, in ancient C, a simple assignment as:
struct A a, b;
...
a = b; /* this was not allowed a long time ago */
was not accepted at all.
So, in order to be able to compile code in every platform, you have to write the old way, as normally, modern compilers allow you to compile legacy code, while the opposite (old compilers accepting new code) is not possible.
And this also applies to returning structures or passing them by value. Apart of being normally a huge waste of resources (it's common to see the whole structure being copied in the stack or copied back to the proper place, once the function returns) old compilers didn't accept these, so to be portable, you must avoid to use these constructs.
Finally a comment: don't use your compiler to check if both constructs generate the same code, as probably it does... but you'll get the wrong assumption that this is common, and you'll run into error. Another different implementation can (and is allowed to do) different translation and result in different code.

Why was mixing declarations and code forbidden up until C99?

I have recently become a teaching assistant for a university course which primarily teaches C. The course standardized on C90, mostly due to widespread compiler support. One of the very confusing concepts to C newbies with previous Java experience is the rule that variable declarations and code may not be intermingled within a block (compound statement).
This limitation was finally lifted with C99, but I wonder: does anybody know why it was there in the first place? Does it simplify variable scope analysis? Does it allow the programmer to specify at which points of program execution the stack should grow for new variables?
I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.
In the very beginning of C the available memory and CPU resources were really scarce. So it had to compile really fast with minimal memory requirements.
Therefore the C language has been designed to require only a very simple compiler which compiles fast. This in turn lead to "single-pass compiler" concept: The compiler reads the source-file and translates everything into assembler code as soon as possible - usually while reading the source file. For example: When the compiler reads the definition of a global variable the appropriate code is emitted immediately.
This trait is visible in C up until today:
C requires "forward declarations" of all and everything. A multi-pass compiler could look forward and deduce the declarations of variables of functions in the same file by itself.
This in turn makes the *.h files necessary.
When compiling a function, the layout of the stack frame must be computed as soon as possible - otherwise the compiler had to do several passes over the function body.
Nowadays no serious C compiler is still "single pass", because many important optimizations cannot be done within one pass. A little bit more can be found in Wikipedia.
The standard body lingered for quite some time to relax that "single-pass" point in regard to the function body. I assume, that other things were more important.
It was that way because it had always been done that way, it made writing compilers a little easier, and nobody had really thought of doing it any other way. In time people realised that it was more important to favour making life easier for language users rather than compiler writers.
I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.
Don't assume that the language designers set out to restrict the language. Often restrictions like this arise by chance and circumstance.
I guess it should be easier for a non-optimising compiler to produce efficient code this way:
int a;
int b;
int c;
...
Although 3 separate variables are declared, the stack pointer can be incremented at once without optimising strategies such as reordering, etc.
Compare this to:
int a;
foo();
int b;
bar();
int c;
To increment the stack pointer just once, this requires a kind of optimisation, although not a very advanced one.
Moreover, as a stylistic issue, the first approach encourages a more disciplined way of coding (no wonder that Pascal too enforces this) by being able to see all the local variables at one place and eventually inspect them together as a whole. This provides a clearer separation between code and data.
Requiring that variables declarations appear at the start of a compound statement did not impair the expressiveness of C89. Anything that one could legitimately do using a mid-block declaration could be done just as well by adding an open-brace before the declaration and doubling up the closing brace of the enclosing block. While such a requirement may sometimes have cluttered source code with extra opening and closing braces, such braces would not have been just noise--they would have marked the beginning and end of variables' scopes.
Consider the following two code examples:
{
do_something_1();
{
int foo;
foo = something1();
if (foo) do_something_1(foo);
}
{
int bar;
bar = something2();
if (bar) do_something_2(bar);
}
{
int boz;
boz = something3();
if (boz) do_something_3(boz);
}
}
and
{
do_something_1();
int foo;
foo = something1();
if (foo) do_something_1(foo);
int bar;
bar = something2();
if (bar) do_something_2(bar);
int boz;
boz = something3();
if (boz) do_something_3(boz);
}
From a run-time perspective, most modern compilers probably wouldn't care about whether foo is syntactically in scope during the execution of do_something3(), since it could determine that any value it held before that statement would not be used after. On the other hand, encouraging programmers to write declarations in a way which would generate sub-optimal code in the absence of an optimizing compiler is hardly an appealing concept.
Further, while handling the simpler cases involving intermixed variable declarations would not be difficult (even a 1970's compiler could have done it, if the authors wanted to allow such constructs), things become more complicated if the block which contains intermixed declarations also contains any goto or case labels. The creators of C probably thought allowing intermixing of variable declarations and other statements would complicate the standards too much to be worth the benefit.
Back in the days of C youth, when Dennis Ritchie worked on it, computers (PDP-11 for example) have very limited memory (e.g. 64K words), and the compiler had to be small, so it had to optimize very few things and very simply. And at that time (I coded in C on Sun-4/110 in the 1986-89 era), declaring register variables was really useful for the compiler.
Today's compilers are much more complex. For example, a recent version of GCC (4.6) has more 5 or 10 million lines of source code (depending upon how you measure it), and does a big lot of optimizations which did not existed when the first C compilers appeared.
And today's processors are also very different (you cannot suppose that today's machines are just like machines from the 1980s, but thousands of times faster and with thousands times more RAM and disk). Today, the memory hierarchy is very important: cache misses are what the processor does the most (waiting for data from RAM). But in the 1980s access to memory was almost as fast (or as slow, by current standards) than execution of a single machine instruction. This is completely false today: to read your RAM module, your processor may have to wait for several hundreds of nanoseconds, while for data in L1 cache, it can execute more that one instruction each nanosecond.
So don't think of C as a language very close to the hardware: this was true in the 1980s, but it is false today.
Oh, but you could (in a way) mix declarations and code, but declaring new variables was limited to the start of a block. For example, the following is valid C89 code:
void f()
{
int a;
do_something();
{
int b = do_something_else();
}
}

C89, Mixing Variable Declarations and Code

I'm very curious to know why exactly C89 compilers will dump on you when you try to mix variable declarations and code, like this for example:
rutski#imac:~$ cat test.c
#include <stdio.h>
int
main(void)
{
printf("Hello World!\n");
int x = 7;
printf("%d!\n", x);
return 0;
}
rutski#imac:~$ gcc -std=c89 -pedantic test.c
test.c: In function ‘main’:
test.c:7: warning: ISO C90 forbids mixed declarations and code
rutski#imac:~$
Yes, you can avoid this sort of thing by staying away from -pedantic. But then your code is no longer standards compliant. And as anybody capable of answering this post probably already knows, this is not just a theoretical concern. Platforms like Microsoft's C compiler enforce this quick in the standard under any and all circumstances.
Given how ancient C is, I would imagine that this feature is due to some historical issue dating back to the extraordinary hardware limitations of the 70's, but I don't know the details. Or am I totally wrong there?
The C standard said "thou shalt not", because it was not allowed in the earlier C compilers which the C89 standard standardized. It was a radical enough step to create a language that could be used for writing an operating system and its utilities. The concept probably simply wasn't considered - no other language at the time allowed it (Pascal, Algol, PL/1, Fortran, COBOL), so C didn't need to either. And it probably makes the compiler slightly harder to handle (bigger), and the original compilers were space-constrained by the 64 KiB code and 64 KiB data space allowed in the PDP 11 series (the big machines; the littler ones only allowed 64 KiB for both code and data, AFAIK). So, extra complexity was not a good idea.
It was C++ that allowed declarations and variables to be interleaved, but C++ is not, and never has been, C. However, C99 finally caught up with C++ (it is a useful feature). Sadly, though, Microsoft never implemented C99.
It was probably never implemented that way, because it was never needed.
Suppose you want to write something like this in plain C:
int myfunction(int value)
{
if (value==0)
return 0;
int result = value * 2;
return result;
}
Then you can easily rewrite this in valid C, like this:
int myfunction(int value)
{
int result;
if (value==0)
return 0;
result = value * 2;
return result;
}
There is absolutely no performance impact by first declaring the variable, then setting its value.
However, in C++, this is not the case anymore.
In the following example, function2 will be slower than function1:
double function1(const Factory &factory)
{
if (!factory.isWorking())
return 0;
Product product(factory.makeProduct());
return product.getQuantity();
}
double function2(const Factory &factory)
{
Product product;
if (!factory.isWorking())
return 0;
product = factory.makeProduct();
return product.getQuantity();
}
In function2 the product variable needs to be constructed, even when the factory is not working.
Later, the factory makes the product and then the assignment operator needs to copy the product (from the return value of makeProduct to the product variable). In function1, product is only constructed when the factory is working, and even then, the copy constructor is called, not the normal constructor and assignment operator.
However, I would expect nowadays that a good C++ compiler would optimize this code, but in the first C++ compilers this probably wasn't the case.
A second example is the following:
double function1(const Factory &factory)
{
if (!factory.isWorking())
return 0;
Product &product = factory.getProduct();
return product.getQuantity();
}
double function2(const Factory &factory)
{
Product &product;
if (!factory.isWorking())
return 0;
product = factory.getProduct(); // Invalid. You can't assign to a reference.
return product.getQuantity();
}
In this example, function2 is simply invalid. References can only be assigned a value at declaration time, not later.
This means that in this example, the only way to write valid code is to write the declaration at the moment where the variable is really initialized. Not sooner.
Both examples show why it was really needed in C++ to allow variable declarations after other executable statements, and not in the beginning of the block like in C.
This explains why this was added to C++, and not to C (and other languages) where it isn't really needed.
It is similar to requiring functions to be declared before they are used - it allows a simple-minded compiler to operate in one pass, from top to bottom, emitting object code as it goes.
In this particular case, the compiler can go through the declarations, adding up the stack space required. When it reaches the first statement, it can output the code to adjust the stack, allocating space for the locals, immediately before the start of the function code proper.
It is much easier to write a compiler for language which requires all variables to be declared at the start of function. Some languages even require variables to be declared in specific clause outside of function code (Pascal and Smalltalk come to mind).
The reason is it's easier to map this variables to stack (or registers if your compiler is smart enough) if they are known and don't change.
Any other statements (esp. function calls) may modify stack/registers, making variable mapping more complex.

How do I know if gcc agrees that something is volatile?

Consider the following:
volatile uint32_t i;
How do I know if gcc did or did not treat i as volatile? It would be declared as such because no nearby code is going to modify it, and modification of it is likely due to some interrupt.
I am not the world's worst assembly programmer, but I play one on TV. Can someone help me to understand how it would differ?
If you take the following stupid code:
#include <stdio.h>
#include <inttypes.h>
volatile uint32_t i;
int main(void)
{
if (i == 64738)
return 0;
else
return 1;
}
Compile it to object format and disassemble it via objdump, then do the same after removing 'volatile', there is no difference (according to diff). Is the volatile declaration just too close to where its checked or modified or should I just always use some atomic type when declaring something volatile? Do some optimization flags influence this?
Note, my stupid sample does not fully match my question, I realize this. I'm only trying to find out if gcc did or did not treat the variable as volatile, so I'm studying small dumps to try to find the difference.
Many compilers in some situations don't treat volatile the way they should. See this paper if you deal much with volatiles to avoid nasty surprises: Volatiles are Miscompiled, and What to Do about It. It also contains the pretty good description of the volatile backed with the quotations from the standard.
To be 100% sure, and for such a simple example check out the assembly output.
Try setting the variable outside a loop and reading it inside the loop. In a non-volatile case, the compiler might (or might not) shove it into a register or make it a compile time constant or something before the loop, since it "knows" it's not going to change, whereas if it's volatile it will read it from the variable space every time through the loop.
Basically, when you declare something as volatile, you're telling the compiler not to make certain optimizations. If it decided not to make those optimizations, you don't know that it didn't do them because it was declared volatile, or just that it decided it needed those registers for something else, or it didn't notice that it could turn it into a compile time constant.
As far as I know, volatile helps the optimizer. For example, if your code looked like this:
int foo() {
int x = 0;
while (x);
return 42;
}
The "while" loop would be optimized out of the binary.
But if you define 'x' as being volatile (ie, volatile int x;), then the compiler will leave the loop alone.
Your little sample is inadequate to show anything. The difference between a volatile variable and one that isn't is that each load or store in the code has to generate precisely one load or store in the executable for a volatile variable, whereas the compiler is free to optimize away loads or stores of non-volatile variables. If you're getting one load of i in your sample, that's what I'd expect for volatile and non-volatile.
To show a difference, you're going to have to have redundant loads and/or stores. Try something like
int i = 5;
int j = i + 2;
i = 5;
i = 5;
printf("%d %d\n", i, j);
changing i between non-volatile and volatile. You may have to enable some level of optimization to see the difference.
The code there has three stores and two loads of i, which can be optimized away to one store and probably one load if i is not volatile. If i is declared volatile, all stores and loads should show up in the object code in order, no matter what the optimization. If they don't, you've got a compiler bug.
It should always treat it as volatile.
The reason the code is the same is that volatile just instructs the compiler to load the variable from memory each time it accesses it. Even with optimization on, the compiler still needs to load i from memory once in the code you've written, because it can't infer the value of i at compile time. If you access it repeatedly, you'll see a difference.
Any modern compiler has multiple stages. One of the fairly easy yet interesting questions is whether the declaration of the variable itself was parsed correctly. This is easy because the C++ name mangling should differ depending on the volatile-ness. Hence, if you compile twice, once with volatile defined away, the symbol tables should differ slightly.
Read the standard before you misquote or downvote. Here's a quote from n2798:
7.1.6.1 The cv-qualifiers
7 Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C.
The keyword volatile acts as a hint. Much like the register keyword. However, volatile asks the compiler to keep all its optimizations at bay. This way, it won't keep a copy of the variable in a register or a cache (to optimize speed of access) but rather fetch it from the memory everytime you request for it.
Since there is so much of confusion: some more. The C99 standard does in fact say that a volatile qualified object must be looked up every time it is read and so on as others have noted. But, there is also another section that says that what constitutes a volatile access is implementation defined. So, a compiler, which knows the hardware inside out, will know, for example, when you have an automatic volatile qualified variable and whose address is never taken, that it will not be put in a sensitive region of memory and will almost certainly ignore the hint and optimize it away.
This keyword finds usage in setjmp and longjmp type of error handling. The only thing you have to bear in mind is that: You supply the volatile keyword when you think the variable may change. That is, you could take an ordinary object and manage with a few casts.
Another thing to keep in mind is the definition of what constitutes a volatile access is left by standard to the implementation.
If you really wanted different assembly compile with optimization

Resources