GCC Optimization and debugging - c

I don't understand exactly the following:
When using debugging and optimization together, the internal rearrangements carried out by the optimizer can make it difficult to see what is going on when examining an optimized program in the debugger. For example, the ordering of statements may be changed.
What i understand is when i build a program with the -g option, then the executable will contain a symbolic table which contains variable, function names, references to them and their line-numbers.
And when i build with an optimization option, for example the ordering of instructions may be changed depends on the optimization.
What i don't understand is, why debugging is more difficult.
I would like to see an example, and an easy to understand explanation.

An example that might happen:
int calc(int a, int b)
{
return a << b + 7;
}
int main()
{
int x = 5;
int y = 7;
int val = calc(x, y);
return val;
}
Optimized this might be the same as
int main()
{
return 642;
}
A contrived example, but trying to debug that kind of optimization in actual code isn’t simple. Some debuggers may show all lines of code marked when stepping through, some might skip them all, some may be confused. And the developer at least is.

simple example:
int a = 4;
int b = a;
int c = b;
printf("%d", c);
can be optimized as:
printf("%d", 4);
In fact in optimized compiles, the compiler might well do exactly this (in machine code of course)
When debugging we the debugger will allow us to inspect the memory associated by a,b and c but when the top version get optimized into the bottom version a,b and c no longer exist in RAM. This makes inspecting RAM a lot harder to figure out what is going on.

When you compile using the optimization flag you are ensured that the output of the program will be compliant to the code you wrote, but the code itself will variate from the one you actually compiled.
As you pointed out, the code will be rearranged and some call will be performed differently. Also another optimization could be loop unrolling, branch prediction and functions calls simplification. These optimizations will also vary on the architecture you are running on.
For all these reasons (and others) your code may become very difficult to debug, since it is transparent for you what the compiler exactly does, thus meaning that the code you want to debug may not look like the one you wrote.

Related

Null statement execution time in C

How does the compiler interpret null statements in C? In terms of execution time. ( empty ";" i.e., without any expression)
And will it optimize code during execution if it encounters null statements, by removing them.
Compilers only care about observable behaviour. Whether you compile
int main() {
;;;;;;;;;;;;;;;;;;
return 0;
}
or
int main() {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
return 0;
}
does not make any difference regarding the resulting executable. The observable behaviour of both examples is the same.
If you want to convince yourself, look at the compilers output (this is a great tool: https://godbolt.org/z/bnbxiP) or try to profile the above examples (but dont expect to get meaningful numbers ;).
My suggestion is to not think about code as a way to talk to your cpu. When you write code you are not expressing instructions for your cpu. Code is rather a recipe for the compiler and your compiler knows much better how to intruct the cpu than any human. Small difference but I think it helps.

How consistently C compilers optimize unreachable code?

Say you have (for reasons that are not important here) the following code:
int k = 0;
... /* no change to k can happen here */
if (k) {
do_something();
}
Using the -O2 flag, GCC will not generate any code for it, recognizing that the if test is always false.
I'm wondering if this is a pretty common behaviour across compilers or it is something I should not rely on.
Does anybody knows?
Dead code elimination in this case is trivial to do for any modern optimizing compiler. I would definitely rely on it, given that optimizations are turned on and you are absolutely sure that the compiler can prove that the value is zero at the moment of check.
However, you should be aware that sometimes your code has more potential side effects than you think.
The first source of problems is calling non-inlined functions. Whenever you call a function which is not inlined (i.e. because its definition is located in another translation unit), compiler assumes that all global variables and the whole contents of the heap may change inside this call. Local variables are the lucky exception, because compiler knows that it is illegal to modify them indirectly... unless you save the address of a local variable somewhere. For instance, in this case dead code won't be eliminated:
int function_with_unpredictable_side_effects(const int &x);
void doit() {
int k = 0;
function_with_unpredictable_side_effects(k);
if (k)
printf("Never reached\n");
}
So compiler has to do some work and may fail even for local variables. By the way, I believe the problem which is solved in this case is called escape analysis.
The second source of problems is pointer aliasing: compiler has to take into account that all sort of pointers and references in your code may be equal, so changing something via one pointer may change the contents at the other one. Here is one example:
struct MyArray {
int num;
int arr[100];
};
void doit(int idx) {
MyArray x;
x.num = 0;
x.arr[idx] = 7;
if (x.num)
printf("Never reached\n");
}
Visual C++ compiler does not eliminate the dead code, because it thinks that you may access x.num as x.arr[-1]. It may sound like an awful thing to do to you, but this compiler has been used in gamedev area for years, and such hacks are not uncommon there, so the compiler stays on the safe side. On the other hand, GCC removes the dead code. Maybe it is related to its exploitation of strict pointer aliasing rule.
P.S. The const keywork is never used by optimizer, it is only present in C/C++ language for programmers' convenience.
There is no pretty common behaviour across compilers. But there is a way to explore how different compilers acts with specific part of code.
Compiler explorer will help you to answer on every question about code generation, but of course you must be familiar with assembler language.

Return a struct directly or fill a pointer?

Let's say I have the following function to initialize a data structure:
void init_data(struct data *d) {
d->x = 5;
d->y = 10;
}
However, with the following code:
struct data init_data(void) {
struct data d = { 5, 10 };
return d;
}
Wouldn't this be optimized away due to copy elision and be just as performant as the former version?
I tried to do some tests on godbolt to see if the assembly was the same, but when using any optimization flags everything was always entirely optimized away, with nothing left but something like this: movabsq $42949672965, %rax, and I am not sure if the same would happen in real code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with RVO, with the latter requiring less code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with
RVO, with the latter requiring less code.
The main reason for the first being so common is historic. The second way of initializing structures from literals was not standard (well, it was, but only for static initializers and never for automatic variables) and it's never allowed on assignments (well, I've not checked the status of the recent standards) Even, in ancient C, a simple assignment as:
struct A a, b;
...
a = b; /* this was not allowed a long time ago */
was not accepted at all.
So, in order to be able to compile code in every platform, you have to write the old way, as normally, modern compilers allow you to compile legacy code, while the opposite (old compilers accepting new code) is not possible.
And this also applies to returning structures or passing them by value. Apart of being normally a huge waste of resources (it's common to see the whole structure being copied in the stack or copied back to the proper place, once the function returns) old compilers didn't accept these, so to be portable, you must avoid to use these constructs.
Finally a comment: don't use your compiler to check if both constructs generate the same code, as probably it does... but you'll get the wrong assumption that this is common, and you'll run into error. Another different implementation can (and is allowed to do) different translation and result in different code.

Frama-C code slicer not loading any C files

I have a 1000 lines C file with 10 maths algorithms written by a professor, I need to delete 9 maths functions and all their dependencies from the 1000 lines, so i am having a go using Frama-C Boron windows binary installer.
Now it won't load the simplest example.c file... i select source file and nothing loads.
Boron edition is from 2010 so i checked how to compile a later Frama-C: they say having a space in my windows 7 user name can cause problems, which is not encouraging.
Is Frama-C my best option for my slicing task?
Here is an example file that won't load:
// File swap.c:
/*# requires \valid(a) && \valid(b);
# ensures A: *a == \old(*b) ;
# ensures B: *b == \old(*a) ;
# assigns *a,*b ;
#*/
void swap(int *a,int *b)
{
int tmp = *a ;
*a = *b ;
*b = tmp ;
return ;
}
Here is the code i wish to take only one function from, the option labelled smooth and swvd. https://sites.google.com/site/kootsoop/Home/cohens_class_code
I looked at the code you linked to, and it does not seem like the best candidate for a Frama-C analysis. For instance, that code is not strictly C99-conforming, using e.g. some old-style prototypes (including implicit int return types), functions that are used before they are defined without forward declarations (fft), and a missing header inclusion (stdlib.h). Those are not big issues, since the changes are relatively simple, and some of them are treated similarly to how gcc -std=c99 works: they emit warnings but not errors. However, it's important to notice that they do require a non-zero amount of time, therefore this won't be a "plug-and-play" solution.
On a more general note, Frama-C relies on CIL (C Intermediate Language) for C code normalization, so the sliced program will probably not be identical to "the original program minus the sliced statements". If the objective is merely to remove some statements but keep the code syntactically identical otherwise, then Frama-C will not be ideal1.
Finally, it is worth noting that some Frama-C analyses can help finding dead code, and the result is even clearer if the code is already split into functions. For instance, using Value analysis on a properly configured program, it is possible to see which statements/functions are never executed. But this does rely on the absence of at least some kinds of undefined behavior.
E.g. if uninitialized variables are used in your program (which is forbidden by the C standard, but occasionally happens and goes unnoticed), the Value analysis will stop its propagation and the code afterwards may be marked as dead, since it is semantically dead w.r.t. the standard. It's important to be aware of that, since a naive approach would be misleading.
Overall, for the code size you mention, I'm not sure Frama-C would be a cost-effective approach, especially if (1) you have never used Frama-C and are having trouble compiling it (Boron is a really old release, not recommended) and (2) if you already know your code base, and therefore would be relatively proficient in manually slicing its parts.
1That said, I do not know of any C slicer that preserves statements like that; my point is that, while intuitively one might think that a C slicer would straightforwardly preserve most of the syntax, C is such a devious language that doing so is very hard, hence why most tools will do some normalization steps beforehand.

Do temp variables slow down my program?

Suppose I have the following C code:
int i = 5;
int j = 10;
int result = i + j;
If I'm looping over this many times, would it be faster to use int result = 5 + 10? I often create temporary variables to make my code more readable, for example, if the two variables were obtained from some array using some long expression to calculate the indices. Is this bad performance-wise in C? What about other languages?
A modern optimizing compiler should optimize those variables away, for example if we use the following example in godbolt with gcc using the -std=c99 -O3 flags (see it live):
#include <stdio.h>
void func()
{
int i = 5;
int j = 10;
int result = i + j;
printf( "%d\n", result ) ;
}
it will result in the following assembly:
movl $15, %esi
for the calculation of i + j, this is form of constant propagation.
Note, I added the printf so that we have a side effect, otherwise func would have been optimized away to:
func:
rep ret
These optimizations are allowed under the as-if rule, which only requires the compiler to emulate the observable behavior of a program. This is covered in the draft C99 standard section 5.1.2.3 Program execution which says:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
Also see: Optimizing C++ Code : Constant-Folding
This is an easy task to optimize for an optimizing compiler. It will delete all variables and replace result with 15.
Constant folding in SSA form is pretty much the most basic optimization there is.
The example you gave is easy for a compiler to optimize. Using local variables to cache values pulled out of global structures and arrays can actually speed up execution of your code. If for instance you are fetching something from a complex structure inside a for loop where the compiler can't optimize and you know the value isn't changing, the local variables can save quite a bit of time.
You can use GCC (other compilers too) to generate the intermediate assembly code and see what the compiler is actually doing.
There is discussion of how to turn on the assembly listings here:Using GCC to produce readable assembly?
It can be instructive to examine the generated code and see what a compiler is actually doing.
While all sorts of trivial differences to the code can perturb the compiler's behavior in ways that mildly improve or worsen performance, in principle it it should not make any performance difference whether you use temp variables like this as long as the meaning of the program is not changed. A good compiler should generate the same, or comparable, code either way, unless you're intentionally building with optimization off in order to get machine code that's as close as possible to the source (e.g. for debugging purposes).
You're suffering the same problem I do when I'm trying to learn what a compiler does--you make a trivial program to demonstrate the problem, and examine the assembly output of the compiler, only to realize that the compiler has optimized everything you tried to get it to do away. You may find even a rather complex operation in main() reduced to essentially:
push "%i"
push 42
call printf
ret
Your original question is not "what happens with int i = 5; int j = 10...?" but "do temporary variables generally incur a run-time penalty?"
The answer is probably not. But you'd have to look at the assembly output for your particular, non-trivial code. If your CPU has a lot of registers, like an ARM, then i and j are very likely to be in registers, just the same as if those registers were storing the return value of a function directly. For example:
int i = func1();
int j = func2();
int result = i + j;
is almost certainly to be exactly the same machine code as:
int result = func1() + func2();
I suggest you use temporary variables if they make the code easier to understand and maintain, and if you're really trying to tighten a loop, you'll be looking into the assembly output anyway to figure out how to finesse as much performance out as possible. But don't sacrifice readability and maintainability for a few nanoseconds, if that's not necessary.

Resources