Why would gcc change the order of functions in a binary? - c

Many questions about forcing the order of functions in a binary to match the order of the source file
For example, this post, that post and others
I can't understand why would gcc want to change their order in the first place?
What could be gained from that?
Moreover, why is toplevel-reorder default value is true?

GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that.
There is no obligation for GCC to compile a C function into a single function in the sense of the ELF format. See elf(5) on Linux
In practice (with optimizations enabled: try compiling foo.c with gcc -Wall -fverbose-asm -O3 foo.c then look into the emitted foo.s assembler file), the GCC compiler is building intermediate representations like GIMPLE. A big lot of optimizations are transforming GIMPLE to better GIMPLE.
Once the GIMPLE representation is "good enough", the compiler is transforming it to RTL
On Linux systems, you could use dladdr(3) to find the nearest ELF function to a given address. You can also use backtrace(3) to inspect your call stack at runtime.
GCC can even remove functions entirely, in particular static functions whose calls would be inline expanded (even without any inline keyword).
I tend to believe that if you compile and link your entire program with gcc -O3 -flto -fwhole-program some non static but unused functions can be removed too....
And you can always write your own GCC plugin to change the order of functions.
If you want to guess how GCC works: download and study its source code (since it is free software) and compile it on your machine, invoke it with GCC developer options, ask questions on GCC mailing lists...
See also the bismon static source code analyzer (some work in progress which could interest you), and the DECODER project. You can contact me by email about both. You could also contribute to RefPerSys and use it to generate GCC plugins (in C++ form).

What could be gained from that?
Optimization. If the compiler thinks some code is like to be used a lot it may put that code in a different region than code which is not expected to execute often (or is an error path, where performance is not as important). And code which is likely to execute after or temporally near some other code should be placed nearby, so it is more likely to be in cache when needed.
__attribute__((hot)) and __attribute__((cold)) exist for some of the same reasons.
why is toplevel-reorder default value is true?
Because 99% of developers are not bothered by this default, and it makes programs faster. The 1% of developers who need to care about ordering use the attributes, profile-guided optimization or other features which are likely to conflict with no-toplevel-reorder anyway.

Related

How can I get the GCC compiler to not optimize a standard library function call like 'printf'?

Is there a way that GCC does not optimize any function calls?
In the generated assembly code, the printf function is replaced by putchar. This happens even with the default -O0 minimal optimization flag.
#include <stdio.h>
int main(void) {
printf("a");
return 0;
}
(Godbolt is showing GCC 9 doing it, and Clang 8 keeping it unchanged.)
Use -fno-builtin to disable all replacement and inlining of standard C functions with equivalents. (This is very bad for performance in code that assumes memcpy(x,y, 4) will compile to just an unaligned/aliasing-safe load, not a function call. And disables constant-propagation such as strlen of string literals. So normally you'd want to avoid that for practical use.)
Or use -fno-builtin-FUNCNAME for a specific function, like -fno-builtin-printf.
By default, some commonly-used standard C functions are handled as builtin functions, similar to __builtin_popcount. The handler for printf replaces it with putchar or puts
if possible.
6.59 Other Built-in Functions Provided by GCC
The implementation details of a C statement like printf("a") are not considered a visible side effect by default, so they aren't something that get preserved. You can still set a breakpoint at the call site and step into the function (at least in assembly, or in source mode if you have debug symbols installed).
To disable other kinds of optimizations for a single function, see __attribute__((optimize(0))) on a function or #pragma GCC optimize. But beware:
The optimize attribute should be used for debugging purposes only. It is not suitable in production code.
You can't disable all optimizations. Some optimization is inherent in the way GCC transforms through an internal representation on the way to assembly. See Disable all optimization options in GCC.
E.g., even at -O0, GCC will optimize x / 10 to a multiplicative inverse.
It still stores everything to memory between C statements (for consistent debugging; that's what -O0 really means); GCC doesn't have a "fully dumb" mode that tries to transliterate C to assembly as naively as possible. Use tcc for that. Clang and ICC with -O0 are somewhat more literal than GCC, and so is MSVC debug mode.
Note that -g never has any effect on code generation, only on the metadata emitted. GCC uses other options (mostly -O, -f*, and -m*) to control code generation, so you can always safely enable -g without hurting performance, other than a larger binary. It's not debug mode (that's -O0); it's just debug symbols.

what gcc compiler options can I use for gfortran

I studied Option Summary for gfortran but found no compiler option to detect integer overflow. Then I found the GCC (GNU Compiler Collection) flag option -fsanitize=signed-integer-overflow here and used it when invoking gfortran. It works--integer overflow can be detected at run time!
So what does -fsanitize=signed-integer-overflow do here? Just adding to the machine code generated by gfortran some machine-level pieces that check integer overflow?
What is the relation between GCC (GNU Compiler Collection) flag options and gfortran compiler options ? What gcc compiler options can I use for gfortran, g++ etc ?
There is the GCC - GNU Compiler Collection. It shares the common backend and middleend and has frontends for different languages. For example frontends for C, C++ and Fortran which are usually invoked by commands gcc, g++ and gfortran.
It is actually more complicated, you can call gcc on a Fortran source and gfortran on a C source and it will work almost the same with the exceptions of libraries being linked (there are some other fine points). The appropriate frontend will be called based on the file extension or the language requested.
You can look almost all GCC (not just gcc) flags for all of the mentioned frontends. There are certain flags which are language specific. Normally you will get a warning like
gfortran -fcheck=all source.c
cc1: warning: command line option ‘-fcheck=all’ is valid for Fortran but not for C
but the file will compile fine, the option is just ignored and you will get a warning about that. Notice it is a C file and it is compiled by the gfortran command just fine.
The sanitization options are AFAIK not that language specific and work for multiple languages implemented in GCC, maybe with some exceptions for some obviously language specific checks. Especially -fsanitize=signed-integer-overflow which you ask about works perfectly fine for both C and C++. Signed integer overwlow is undefined behaviour in C and C++ and it is not allowed by the Fortran standard (which effectively means the same, Fortran just uses different words).
This isn't a terribly precise answer to your question, but an aha! moment, when learning about compilers, is learning that gcc (the GNU Compiler Collection), like llvm, is an example of a three-stage compiler.
The ‘front end’ parses the syntax of whichever language you're interested, and spits out an Abstract Syntax Tree (AST), which represents your program in a language-independent way.
Then the ‘middle end’ (terrible name, but ‘the clever bit’) reorganises that AST into another AST which is semantically equivalent but easier to turn into machine code.
Then the ‘back end’ turns that reorganised AST into assembler for one-or-other processor, possibly doing platform-specific micro-optimisations along the way.
That's why the (huge number of) gcc/llvm options are unexpectedly common to (apparently wildly) different languages. A few of the options are specific to C, or Fortran, or Objective-C, or whatever, but the majority of them (probably) are concerned with the middle and last bits, and so are common to all of the languages that gcc/llvm supports.
Thus the various options are specific to stage 1, 2 or 3, but may not be conveniently labelled as such; with this in mind, however, you might reasonably intuit what is and isn't relevant to the particular language you're interested in.
(It's for this sort of reason that I will dogmatically claim that CC++FortranJavaPerlPython is essentially a single language, with only trivial syntactical and library minutiae to distinguish between dialects).

gcc generating a list of function and variable names

I am looking for a way to get a list of all the function and variable names for a set of c source files. I know that gcc breaks down those elements when compiling and linking so is there a way to piggyback that process? Or any other tool that could do the same thing?
EDIT: It's mostly because I am curious, I have been playing with things like make auto dependency and graphing include trees and would like to be able to get more stats on the source files. And it seems like something that would already exist but i haven't found any options or flags for it.
If you are only interested by names of global functions and variables, you might (assuming you are on Linux) use the nm or objdump utilities on the ELF binary executable or object files.
Otherwise, you might customize the GCC compiler (assuming you have a recent version, e.g. 5.3 or 6 at least) thru plugins. You could code them directly in C++, or you might consider using GCC MELT, a Lisp-like domain specific language to customize GCC. Perhaps even the findgimple mode of GCC MELT might be enough....
If you consider extending GCC, be aware that you'll need to spend a significant time (perhaps months) understanding its internal representations (notably Generic Trees & Gimple) in details. The links and slides on GCC MELT documentation page might be useful.
Your main issue is that you probably need to understand most of the details about GCC internal representations, and that takes time!
Also, the details of GCC internals are slightly changing from one version of GCC to the next one.
You could also consider (instead of working inside GCC) using the Clang/LLVM framework (but learning that is also a lot of time). Maybe you might also look into Frama-C or Coccinnelle.
Another approach might be to compile with debug info and parse DWARF information.
PS. My point is that your problem is probably much more difficult than what you believe. Parsing C is not that simple ... You might spend months or even years working on that... And details could be target-processor & system & compiler specific...

Size optimization options

I am trying to sort out an embedded project where the developers took the option of including all the h and c files into a c file, then they can compile just that one file with the -whole-program option to get good size optimization.
I hate this and am determined to make this into a traditional program just using LTO to achieve the same.
The versions included with the dev kit are;
aps-gcc (GCC) 4.7.3 20130524 (Cortus)
GNU ld (GNU Binutils) 2.22
With one .o file .text is 0x1c7ac, fractured into 67 .o files .text comes out as 0x2f73c, I added the LTO stuff and reduced it to 0x20a44, good but nowhere near enough.
I have tried --gc-sections and using the linker plugin option but they made no further improvment.
Any suggestions, am I see the right sort of improvement from LTO?
To get LTO to work perfectly you need to have the same information and optimisation algorithms available at link stage as you have at compile stage. The GNU tools cannot do this and I believe this was actually one of the motivating factors in the creation of LLVM/Clang.
If you want to inspect the difference in detail I'd suggest you generate a Map file (ld option -Map <filename>) for each option and see if there are functions which haven't been in-lined or functions that are larger. The lack of in-lining you can manually resolve by forcing those functions to inline by moving the definition of the function into a header file and defining it as extern inline which effectively turns it into a macro (this is a GNU extension).
Larger functions are likely not being subject to constant propagation and I don't think there's anything you can do about that. You can make some improvements by carefully declaring the function attributes such as const, leaf, noreturn, pure, and returns_nonnull. These effectively promise that the function will behave in a particular way that the compiler may otherwise detect if using a single compilation unit, and that allow additional optimisations.
In contrast, Clang can compile your object code to a special kind of bytecode (LLVM stands for Low Level Virtual Machine, like JVM is Java Virtual Machine, and runs bytecode) and then optimisation of this bytecode can be performed at link time (or indeed run-time, which is cool). Since this bytecode is what is optimised whether you do LTO or not, and the optimisation algorithms are common between the compiler and the linker, in theory Clang/LLVM should give exactly the same results whether you use LTO or not.
Unfortunately now that the C backend has been removed from LLVM I don't know of any way to use the LLVM LTO capabilities for the custom CPU you're targeting.
In my opinion, the method chosen by the previous developers is the correct one. It is the method that gives the compiler the most information and thus the most opportunities to perform the optimizations that you want. It is a terrible way to compile (any change will require the whole project to be compiled) so marking this as just an option is a good idea.
Of course, you would have to run all your integration tests against such a build, but that should be trivial to do. What is the downside of the chosen approach except for compilation time (which shouldn't be an issue because you don't need to build in that manner all the time ... just for integration tests).

Do I really need libgcc?

I've been using GCC 4.6.2 on Mac OS X 10.6. I use the -static-libgcc option when I compile, otherwise my binaries look for libgcc on the system and I'm not sure anything over GCC 4.2 is supported on OS X. This works fine, but why do I even need libgcc? I read up on it and the GNU docs say it contains "arithmetic operations that the target processor cannot perform directly." How do I know what these operations are? And why are they so complex that I need to include this library? Why can't GCC just optimize the code directly instead of having to resort to these library functions? I'm a little confused. Any insight into this would be appreciated!
Yes, you do need it .... probably. If you don't need it then statically linking it is harmless. You can tell if you need it by using the -t link trace option (I think).
There are various things that you cant do in one instruction (typically things like 64-bit operations on 32-bit architectures). These things can be done, but if they use a non-trivial number of instructions then it's more space-efficient to have them all in one place.
When you disable optimization using -O0 (that's actually the default anyway) then GCC pretty much always uses the libgcc routines.
When you enable speed optimization then GCC may choose to insert the instruction sequence directly into the code (if it knows how). You may find that it ends up using none of the libgcc versions - it will certainly use fewer libgcc calls.
When you enable size optimizations then GCC may prefer the function call, or may not - it depends on what the GCC developers think is the best speed/size trade-off in each case. Note that even when you tell it to optimize for speed, the compiler may judge that some functions are unlikely to be used, and optimize those for size - even more so if you use PGO.
Basically, you can think of it in the same way as memcpy or the math-library functions: the compiler will inline functions it judges to be beneficial, and call library functions otherwise. The compiler can "inline" standard functions and libgcc function without looking at the library definition, of course - it just "knows" what they do.
Whether to use static or dynamic libgcc is an interesting trade-off. On the one hand, a dynamic (shared) library will use less memory across your whole system, and is more likely to be cached, etc. On the other hand, a static libgcc has a lower call overhead.
The most important thing though is compatibility. Obviously the libgcc library has to be present for your program to run, but it also has to be a compatible version. You're ok on a Linux distro with a stable GCC version, but otherwise static linking is safer.
I hope that answers your questions.

Resources