clang, lto, prevent function removal - c

I am compiling a project with a modified version of clang using link time optimization (lto) and O2 optimization level. O0 and O1 are doing fine, but sadly O2 removes some calls to functions. Is there a way to tell the optimization to omit specific functions?
I have already tried using volatile variables as well as __attribute__ ((optimize("0"))) without success.
Solutions only available directly on llvm IR level are also welcome.
Edit: Maybe I should explain the situation with a little more detail.
The modified clang adds calls to a custom runtime lib which is build together with clang.
Some of this inserted calls get optimized away.

I believe __attribute(used)__ (GCC) or llvm.used (LLVM) is what you're looking for.

Adding __attribute__((noinline)) will keep the so-designated functions from disappearing. You could also prevent it globally with -fno-inline.

Related

Examples of 'falign-loops' optimisation occuring?

One pass run by the compiler when optimising in gcc is falign-loops.
Although a vague description is provided here: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/data-options/falign-loops-qalign-loops.html
It is listed as one of the optimisations occurring with the -O2 flag here:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
I have been unable to actually see it work in action with any piece of code I have tried using compiler explorer. Does anyone know how the flag functions and perhaps have some explicit examples?
Thanks

Size optimization options

I am trying to sort out an embedded project where the developers took the option of including all the h and c files into a c file, then they can compile just that one file with the -whole-program option to get good size optimization.
I hate this and am determined to make this into a traditional program just using LTO to achieve the same.
The versions included with the dev kit are;
aps-gcc (GCC) 4.7.3 20130524 (Cortus)
GNU ld (GNU Binutils) 2.22
With one .o file .text is 0x1c7ac, fractured into 67 .o files .text comes out as 0x2f73c, I added the LTO stuff and reduced it to 0x20a44, good but nowhere near enough.
I have tried --gc-sections and using the linker plugin option but they made no further improvment.
Any suggestions, am I see the right sort of improvement from LTO?
To get LTO to work perfectly you need to have the same information and optimisation algorithms available at link stage as you have at compile stage. The GNU tools cannot do this and I believe this was actually one of the motivating factors in the creation of LLVM/Clang.
If you want to inspect the difference in detail I'd suggest you generate a Map file (ld option -Map <filename>) for each option and see if there are functions which haven't been in-lined or functions that are larger. The lack of in-lining you can manually resolve by forcing those functions to inline by moving the definition of the function into a header file and defining it as extern inline which effectively turns it into a macro (this is a GNU extension).
Larger functions are likely not being subject to constant propagation and I don't think there's anything you can do about that. You can make some improvements by carefully declaring the function attributes such as const, leaf, noreturn, pure, and returns_nonnull. These effectively promise that the function will behave in a particular way that the compiler may otherwise detect if using a single compilation unit, and that allow additional optimisations.
In contrast, Clang can compile your object code to a special kind of bytecode (LLVM stands for Low Level Virtual Machine, like JVM is Java Virtual Machine, and runs bytecode) and then optimisation of this bytecode can be performed at link time (or indeed run-time, which is cool). Since this bytecode is what is optimised whether you do LTO or not, and the optimisation algorithms are common between the compiler and the linker, in theory Clang/LLVM should give exactly the same results whether you use LTO or not.
Unfortunately now that the C backend has been removed from LLVM I don't know of any way to use the LLVM LTO capabilities for the custom CPU you're targeting.
In my opinion, the method chosen by the previous developers is the correct one. It is the method that gives the compiler the most information and thus the most opportunities to perform the optimizations that you want. It is a terrible way to compile (any change will require the whole project to be compiled) so marking this as just an option is a good idea.
Of course, you would have to run all your integration tests against such a build, but that should be trivial to do. What is the downside of the chosen approach except for compilation time (which shouldn't be an issue because you don't need to build in that manner all the time ... just for integration tests).

inlining C code : -flto or not -flto

One of my recent program highly depends on inlining a few "hot" functions for performance. These hot functions are part of an external .c file which I would prefer not to change.
Unfortunately, while Visual is pretty good at this exercise, gcc and clang are not. Apparently, due to the fact that the hot functions are within a different .c, they can't inline them.
This leaves me with 2 options :
Either include directly the relevant code into the target file. In practice, that means #include "perf.c" instead of #include "perf.h". Trivial change but it looks ugly. Clearly it works. It's just a little bit more complex to explain to the build chain that perf.c must be there but not be compiled nor linked.
Use -flto, for Link Time Optimisation. It looks cleaner, and is what Visual achieves by default.
The problem is, with -flto, gcc linking stage generates multiple warnings, which seem to be internal bugs (they refer to portion of code from within the standard libs, so I have little control over them). This is embarrassing when targeting a "zero warning" policy (even though the binary generated is perfectly fine).
As to clang, it just fails with -flto, due to packaging error (error loading plugin: LLVMgold.so) which is apparently very common accross multiple linux distros.
2 questions :
Is there a way to turn off these warning messages when using -flto on gcc ?
Which of the 2 methods described above methods seems the better one, given pro and con ?
Optional : is there another solution ?
According to your comment you have to suport gcc 4.4. As LTO started with gcc 4.5 (with all caution about early versions), the answer should be clearly. no -flto.
So, #include the code with all due caution, of course.
Update:
The file-extension should not be .c, though, but e.g. .inc (.i is also a bad idea). Even better: .h and change the functions to static inline. That still might not guarantee inlining, but that's the same as for all functions and it maintains the appearance of a clean header (although a longer inline function still is bad style).
Before doing all this, I'd properly profile, if the code really has a problem. One should concentrate on writing readable and maintainable code in the first place.

Clang or GCC equivalent of _PGOPTI_Prof_Dump_All() from ICC

Intel C(++) Compiler has very useful functions to help with profile guided optimisation.
_PGOPTI_Prof_Reset_All();
/* code */
_PGOPTI_Prof_Dump_All();
https://software.intel.com/en-us/node/512800
This is particularly useful for profiling shared libraries which one would use with ctypes in Python.
I've been trying to figure out if either Clang or GCC have similar functionality – apparently not.
Profile guided optimization works differently in gcc and it is enabled with compiler switches. See this question for PGO with gcc.
PGO just recently arrived in clang and is only available starting at version 3.5. The clang user manual gives an overview of how to use it.
It turns out that both have an internal and not properly documented function named __gcov_flush which does this. It is only explained in the source.
/* Called before fork or exec - write out profile information
gathered so far and reset it to zero. This avoids duplication or
loss of the profile information gathered so far. */
It's not quite as convenient as the Intel equivalent though and requires some gymnastics to make it work.

Function specific optimization in GCC 4.4.3

In reference to my earlier question here, I found out a possilbe bug in GCC 4.4.3 when it did not support following pragmas in the source code for optimization (although it says 4.4.x onwards it does!)
#pragma GCC optimize ("O3")
__attribute__((optimize("O3")))
Tried both above options but both gave compile time errors in the compiler itself(See the error message snapshot posted in the link mentioned above)
Now are there any further options for me to enable different optimization levels for different functions in my C code?
From the online docs:
Numbers are assumed to be an optimization level. Strings that begin with O are assumed to be an optimization option, while other options are assumed to be used with a -f prefix.
So, if you want the equivalent of the command line -O3 you should probably use the just the number 3 instead of "O3".
I agree that this is a bug and should not generate an ICE, consider reporting it along with a small test case to the GCC guys.
Now are there any further options for me to enable different optimization levels for different functions
in my C code?
Your remaining option is to place the functions in their own .c file and compile that .c file with the optimization flag you want.

Resources