Examples of 'falign-loops' optimisation occuring? - c

One pass run by the compiler when optimising in gcc is falign-loops.
Although a vague description is provided here: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/data-options/falign-loops-qalign-loops.html
It is listed as one of the optimisations occurring with the -O2 flag here:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
I have been unable to actually see it work in action with any piece of code I have tried using compiler explorer. Does anyone know how the flag functions and perhaps have some explicit examples?
Thanks

Related

-Xassembler and -Xpreprocessor examples

I recently got my hands dirty with assembly and c code and found the gcc option -Xassembler -Xpreprocessor. i searched online for simple examples and the values these gcc options take, but couldn't find.
help appreciated.
thank you
-Xassembler: It passes an option to the assembler as a compilation option, such as specific options regarding architecture (which most probably GCC couldn't recognize). It is similar to -Wa (however the way to pass arguments change). For the completeness sake, I am used to see -Wa instead of -Xassembler, I guess backward compatibility explains why there are two similar options.
An example for -Xassembler (ARM arch): -Xassembler -mthumb to assemble for Thumb architectures (or -Wa,-mthumb).
-Xpreprocessor: It passes an option to the preprocessor, as before, it is useful to pass options that GCC doesn't recognize. It is similar to -Wp (and the way to pass arguments change).
An example for -Xpreprocessor: -Xpreprocessor -M (or -Wp,-M) in order to
output a rule suitable for make describing the dependencies of the main source file

GCC optimization flag problems

I am having a problem with some C code built with gcc compiler. The code in question has an enum, whose values are used as cases in a switch statement to configure an object. Fairly standard stuff.
When I compile the code using the -O0 option flag, everything builds and runs correctly no problem. However, when the flag is set to -O2 the code no longer works as expected.
When I step through the code and put a watch on the local variables, the enum, which should be only be one of three enum values, is actually -104! This causes the program to fail to configure the object.
Has anyone encountered this before who could provide some guidance? I haven't encountered this before and would appreciate if someone could explain why the compiler does this so I can make any necessary changes.
Snippet of code in question:
value = 0u;
switch(test_config) {
case DISABLE:
break;
case INTERNAL:
value = 1u;
break;
case EXTERNAL:
value = 2u;
break;
default:
valid = FALSE;
break;
}
if (valid) {
configure_test(value);
}
Enum in question:
typedef enum {
DISABLE,
INTERNAL,
EXTERNAL
} test_config_t;
This is the code that is causing the problem. I initially didn't include it because I didn't want the question to be please fix my code, rather I have been googling looking for reasons why gcc optimisation flags would produce different results for the same piece of code and haven't found anything particularly helpful. Also I am not at my computer and had to type this on my phone which also doesn't help. So I came here because there are experts here who know way more than me that could point me in the right direction.
Some more info that I probably should have included. The code runs on hardware which also might be the problem and I am looking into that as well. When ran from FSBL the code works with -O0, but not with -O2. So it may be hardware, but then I don't know why it works one way not the other.
You don't give enough details (since your question don't show any actual code, it should have some MCVE) but you very probably have some undefined behavior and you should be scared.
Remember that C11 or C99 (like most programming languages) is defined by an explicit specification (not only by the concrete behaviour observed on your code) written in English and partly defining the runtime behaviour of a valid C program. Read n1570.
I strongly recommend reading Lattner's blog What Every C programmer should know about Undefined Behavior before even touching or compiling your source code.
I recommend at least compiling with (nearly) all warnings and debug info, e.g. with gcc -Wall -Wextra -g, then improve the code to get no warnings, and run it under the gdb debugger and valgrind. Read more about Invoking GCC. You may also use (temporarily) some sanitizer instrumentation options, notably -fsanitize=undefined and -fsanitize=address. You could also add -std=gnu99 and -pedantic to your compiler flags. Notice that gdb watchpoints are a very useful debugger feature to find why a value has changed or is unexpected.
When you compile for release or for benchmarking with optimizations enabled, keep also the warning flags (so compile with gcc -O2 -Wall -Wextra); optimizations might give extra warnings which you should also correct. BTW, GCC accepts both -O2 and -g at the same time.
When you observe such issues, question first your own code before suspecting the compiler (because compilers are very well tested; I found only one compiler bug in almost 40 years of programming).

clang, lto, prevent function removal

I am compiling a project with a modified version of clang using link time optimization (lto) and O2 optimization level. O0 and O1 are doing fine, but sadly O2 removes some calls to functions. Is there a way to tell the optimization to omit specific functions?
I have already tried using volatile variables as well as __attribute__ ((optimize("0"))) without success.
Solutions only available directly on llvm IR level are also welcome.
Edit: Maybe I should explain the situation with a little more detail.
The modified clang adds calls to a custom runtime lib which is build together with clang.
Some of this inserted calls get optimized away.
I believe __attribute(used)__ (GCC) or llvm.used (LLVM) is what you're looking for.
Adding __attribute__((noinline)) will keep the so-designated functions from disappearing. You could also prevent it globally with -fno-inline.

Clang or GCC equivalent of _PGOPTI_Prof_Dump_All() from ICC

Intel C(++) Compiler has very useful functions to help with profile guided optimisation.
_PGOPTI_Prof_Reset_All();
/* code */
_PGOPTI_Prof_Dump_All();
https://software.intel.com/en-us/node/512800
This is particularly useful for profiling shared libraries which one would use with ctypes in Python.
I've been trying to figure out if either Clang or GCC have similar functionality – apparently not.
Profile guided optimization works differently in gcc and it is enabled with compiler switches. See this question for PGO with gcc.
PGO just recently arrived in clang and is only available starting at version 3.5. The clang user manual gives an overview of how to use it.
It turns out that both have an internal and not properly documented function named __gcov_flush which does this. It is only explained in the source.
/* Called before fork or exec - write out profile information
gathered so far and reset it to zero. This avoids duplication or
loss of the profile information gathered so far. */
It's not quite as convenient as the Intel equivalent though and requires some gymnastics to make it work.

Value optimized out in GDB: Can gdb handle decoding it automatically?

1) First I want to know, how to decode such variables ?
I know the solutions to this problem, remove optimization flag, make it volatile, I dont want to do all that. Is there any solution which can be done without compiling the source again ? The problem is whenever i make any changes, it takes ages to compile, so I dont want to compile it with different optimization flags, also I had tried once changing the optimization flag, but it crashed just because of change in compilation flags, for reasons I cant fathom.
Also I am not able to find documentation about understanding various registers when I do "info reg". i was expecting some variable ( whose value I knew, what would it be ) but info reg is showing me all different values. I am missing something here. The architecture I am working on is x86_64
2) I want to know what are the restrictions faced by gdb to decode such register variables ? Or is this problem already tackled by someone. I have read at many places that going through the assembly code, you can find out which variable is in that register. If thats true, why it cant be build into gdb. Please point me to relevant pages if there are solutions to this problem
If you don't have the source and compile with debug/no optimizations (i.e. 3rd party code.) the best you can do would be to disassemble the code and try to determine how the variables are stored.
In gdb the disassemble instruction will dump the assembly for the given function:
disassemble <function name>
Or if symbols have been stripped
disassemble <address>
where <address> is the entry point to the function.
You may also have to inspect where the function is called to determine the calling conventions used.
Once you've figured out the structure of the functions and variable layout (stack variables or registers), when debugging you can step through each instruction with nexti and stepi and watch how the values in the variables change by dumping the contents of the registers or memory locations.
I don't know any good primers or tutorials myself but this question and its answers may be of use to you. Personally I find myself referencing the Intel manuals the most. They can be downloaded in pdf from Intel's website. I don't have a link handy at the moment. If someone else does perhaps they can update my answer.
Have you looked at compiling your code un-optimized?
Try one of these in your gcc options:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.

Resources