gcc -funswitch-loops and -O3 - loops

I'm interested in the loop unswitching optimization option -funswitch-loops to GCC, in particular what actually enables it. According to the documentation:
The following options control optimizations that may improve performance, but are not enabled by any -O options. This section includes experimental options that may produce broken code.
...
-funswitch-loops
Move branches with loop invariant conditions out of the loop, with duplicates of the loop on both branches (modified according to result of the condition).
Enabled by -fprofile-use and -fauto-profile.
So if I'm not already using -fprofile-use or -fauto-profile, it seems that I have to explicitly add -funswitch-loops to my list of compiler flags in order to activate loop unswitching. Fair enough. Though elsewhere in the same documentation, we find
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:
...
-funswitch-loops
...
So the documentation seems to claims that -funswitch-loops is switched on by -O3, but also that it is not turned on by any of the -O options. Which one is it?

You can compile with -v -Q to see a list of all the optimization options that are in effect. With -v -Q -O3 on gcc 10.2, I see -funswitch-loops is included.
So its listing under "not enabled by any -O options" is apparently an error. You could report it as a documentation bug.

Related

gcc linking object files with warning/optimization flags

We are compiling a piece of software using generics where files are first made into object files, they are built like so:
arm-unknown-linux-gnu-gcc -c -O2 -Wstrict-prototypes -Wdeclaration-after-statement -fsigned-char -I/opt/tm-sdk/include -mlittle-endian -Wno-trigraphs -fno-strict-aliasing -fno-omit-frame-pointer -march=armv4 -mtune=arm9tdmi -Wall -Wextra -o src/flex.o src/flex.c
...
arm-unknown-linux-gnu-gcc -c -O2 -Wstrict-prototypes -Wdeclaration-after-statement -fsigned-char -I/opt/tm-sdk/include -mlittle-endian -Wno-trigraphs -fno-strict-aliasing -fno-omit-frame-pointer -march=armv4 -mtune=arm9tdmi -Wall -Wextra -o src/flexdb.o src/flexdb.c
Then they are linked with:
arm-unknown-linux-gnu-gcc -o flex src/flex.o src/flexdb.o src/flexio.o src/flexprotocol.o src/flexsettings.o src/flexstate.o -L/opt/tm-sdk/lib -ltag -lrt -ltmreader -lsqlite3 -lsha1
My questions is:
Do we need to include optimization and warning flags during linking? Would it do anything if -Wall, -Wextra, and -O2 were included when creating the flex binary from the object files?
Edit: Clarifying meaning based on feedback.
Do we need to include optimization and warning flags during this final stage of compilation?
Of course you don't need to include them for the link stage. You already know that, because you don't include them. But I think what you really want to know is ...
Would it do anything if -Wall, -Wextra, and -O2 were included when building the flex binary from object files.
All or almost all warnings are generated during the compilation stage. I don't off-hand know any exceptions, but it's conceivable that there are some. Thus, it's possible that passing warning-related flags during linking would trigger warnings that otherwise you would not receive. But that shouldn't affect the compiled binary in any way.
Optimization is different. There are optimizations that can be performed at link time, but that might not be performed at the default optimization level. Omitting optimization flags from the link command should not break your build, but including them may result in a binary that is faster and / or smaller.
Overall, I see no good reason to avoid passing the same warning and optimization flags during the link step that you do during the compilation steps. If you wish, it's not harmful to pass preprocessor-specific flags (e.g. -D) as well, as they'll just be ignored during linking. I presume that all this is managed by make, so it's not like you actually need to type out the options every time.
NO You're just calling the linker with the final call to gcc and -W and -O flags are for the compiler.
-Wall is primarily a preprocessor option, but there is also a reference for the libraries. See here
-Wextra appears to be strictly a c++ preprocessor option. See here
-O2 is a compiler optimization level setting. See here
So to answer your precise question, only -Wall would possibly help during your link step. The other two would not. You could test this by building with and without these options, seeing if any additional output is created in the case of warnings and if the code size or execution time differed between builds.

How to optimize the executable generated by a C compilation in Linux terminal

Is there any way to reduce the memory used by an executable generated with a command like gcc source_file.c -o result? I browsed the Internet and also looked in the man page for "gcc" and I think that I should use something related to -c or -S. So is gcc -c -S source_file.c -o result working? (This seems to reduce the space used...is there any other way to reduce even more?)
Thanks,
Polb
The standard compiler option on POSIX-like systems to instruct the compiler to optimize is -O (capital letter O for optimize). Many compilers allow you to optionally specify an optimization level after -O. Common optimization levels include:
-O0 no optimization at all
-O1 basic optimization for speed
-O2 all of -O1 plus some advanced optimizations
-O3 all of -O2 plus expensive optimizations that aren't usually needed
-Os optimize for size instead of speed (gcc, clang)
-Oz optimize even more for size (clang)
-Og all of -O2 except for optimizations that hinder debugging (gcc)
-Ofast all of -O3 and some numeric optimizations not in conformance with standard C. Use with caution. (gcc)
option -S generate assembler output.
Better option is using llvm and generating asembler for multi architecture. Similar http://kripken.github.io/llvm.js/demo.html
llc
here is example https://idea.popcount.org/2013-07-24-ir-is-better-than-assembly/

Compiler flags when setting debug and release mode

gcc (GCC) 4.7.2
cmake version 2.8.10.2
c89
I am using cmake as my build system. And I want to set the flags accordingly to either debug or release.
Compile with debug
SET(CMAKE_C_FLAGS_DEBUG "-Wall -Wextra -Wunreachable-code -g -m32 -D_DEBUG -O0 -D_LARGEFILE64_SOURCE -D_REETRANT -D_THREAD_SAFE")
Compile with release
SET(CMAKE_C_FLAGS_RELEASE "-Wall -Wextra -Wunreachable-code -m32 -DNDEBUG -O2 -D_LARGEFILE64_SOURCE -D_REETRANT -D_THREAD_SAFE")
For the debug I have included the flags -g and set optimization to level 0.
For the release I have removed flags -g and add the N for -DNDEBUG, -DNREETRANT, and -DNTHREAD_SAFE and set the optimization to level 2.
I am right in saying that the N means NO DEBUG?
Is this all I would need to do to distinguish between a debug and release build?
Here is some recommendations based on your CMake flags:
Debug mode -g -O0, and release mode -O2 -g . It is better to add the -g flag even for O2 release mode if the code size is not critical. With -g, some symbol information of the code is included, it only increase the whole binary size by about 10%~ 20%, but users can help give you a stack dump, which can help a lot for debug.
Use -UDEBUG -UREETRANT instead of -DNDEBUG -DNREETRANT. The -DMACRO flag is just like add #define MACOR in your code, and if you have code region like #ifdef MACRO SOME_DEBUG_CODE #endif, the preprocessor will keep the code there, while -UDEBUG flag is like #undefine MACRO, and the preprocessor will delete the SOME_DEBUG_CODE. If you just add -DNDEBUG -DNREETRANT, it is like add#define NDEBUG #defin NREETRANT in your code, which is meaningless.
What would you need -D_DEBUG for if you could just as well check for the absence of NDEBUG? (Unless, of course, you want seperate control over assert and other debugging #ifdef's. I never had that need.)
Why would you want different settings for reentrancy and thread safety?
Bottom line, what you want different between Debug and Release code is completely up to you. I want my debug code to include coverage information, and my release code to be -O3. Your mileage obviously varies. Neither way is "right" or "wrong".
You might also want to note that:
CMake sets stuff like -g -O0 and -O2 automatically, so it would be sufficient to append the desired warnings and additional flags.
You might want to keep an eye on portability and check for if ( CMAKE_COMPILER_IS_GNUC ) (or ..._GNUCC for C++) before you set the compiler flags.
As Ling Kun said, you can keep -g even in Release code. At least during development it is useful, and you can command CMake to strip executables during installation. (Personally, I strive for not having to make my customers extract post-mortem stack / core dumps. ;-) )

Differences between -O0 and -O1 in GCC

While compiling some code I noticed big differences in the assembler created between -O0 and -O1. I wanted to run through enabling/disabling optimisations until I found out what was causing a certain change in the assembler.
If I use -fverbose-asm to find out exactly which flags O1 is enabling compared to O0, and then disable them manually, why is the assembler produced still so massively different? Even if I run gcc with O0 and manually add all the flags that fverbose-asm said were enabled with O1, I don't get the same assembler that I would have got just by using O1.
Is there anything apart from '-f...' and '-m...' that can be changed?
Or is is just that 'O1' has some magic compared with 'O0' that cannot be turned off.
Sorry for the crypticness - this was related to Reducing stack usage during recursion with GCC + ARM however the mention of it was making the question a bit hard to understand.
If all you want is to see which passes are enabled at O1 which are not enabled at O0 you could run something like:
gcc -O0 test.c -fdump-tree-all -da
ls > O0
rm -f test.c.*
gcc -O1 test.c -fdump-tree-all -da
ls > O1
diff O0 O1
A similar process, using the set of flags which you discovered, will let you see what extra magic passes not controlled by flags are undertaken by GCC at O1.
EDIT:
A less messy way might be to compare the output of -fdump-passes, which will list which passes are ON or OFF to stderr.
So something like:
gcc -O0 test.c -fdump-passes |& grep ON > O0
gcc -O1 test.c -fdump-passes |& grep ON > O1
diff O0 O1
Not that this helps, other than providing some evidence for your suspicions about -O1 magic that can't be turned off:
From http://gcc.gnu.org/ml/gcc-help/2007-11/msg00214.html:
CAVEAT, not all optimizations enabled by -O1 have a command-line toggle flag to disable them.
From Hagen's "Definitive Guide to GCC, 2nd Ed":
Note: Not all of GCC’s optimizations can be controlled using a flag. GCC performs some optimizations automatically and, short of modifying the source code, you cannot disable these optimizations when you request optimization using -O
Unfortunately, I haven't found any clear statement about what these hard-coded optimizations might be. Hopefully someone who is knowlegable about GCC's internals might post an answer with some information about that.
In addition to the many options you can also change parameters, e.g.
--param max-crossjump-edges=1
which affects the code generation. Check the source file params.def for all available params.
But there is no way to switch from -O0 to -O1, or from -O1 to -O2, or from -Os or to -Os or etc. p.p. , by adding options, without patching the source code, since there are several hard coded locations where the level is checked without consulting an command line option, e.g.:
return perform_tree_ssa_dce (/*aggressive=*/optimize >= 2);

Which gcc optimization flags should I use?

If I want to minimize the time my c programs run, what optimization flags should I use (I want to keep it standard too)
Currently I'm using:
-Wall -Wextra -pedantic -ansi -O3
Should I also use
-std=c99
for example?
And is there I specific order I should put those flags on my makefile? Does it make any difference?
And also, is there any reason not to use all the optimization flags I can find? do they ever counter eachother or something like that?
I'd recommend compiling new code with -std=gnu11, or -std=c11 if needed. Silencing all -Wall warnings is usually a good idea, IIRC. -Wextra warns for some things you might not want to change.
A good way to check how something compiles is to look at the compiler asm output. http://gcc.godbolt.org/ formats the asm output nicely (stripping out the noise). Putting some key functions up there and looking at what different compiler versions do is useful if you understand asm at all.
Use a new compiler version. gcc and clang have both improved significantly in newer versions. gcc 5.3 and clang 3.8 are the current releases. gcc5 makes noticeably better code than gcc 4.9.3 in some cases.
If you only need the binary to run on your own machine, you should use -O3 -march=native.
If you need the binary to run on other machines, choose the baseline for instruction-set extensions with stuff like -mssse3 -mpopcnt. You can use -mtune=haswell to optimize for Haswell even while making code that still runs on older CPUs (as determined by -march).
If your program doesn't depend on strict FP rounding behaviour, use -ffast-math. If it does, you can usually still use -fno-math-errno and stuff like that, without enabling -funsafe-math-optimizations. Some FP code can get big speedups from fast-math, like auto-vectorization.
If you can usefully do a test-run of your program that exercises most of the code paths that need to be optimized for a real run, then use profile-directed optimization:
gcc -fprofile-generate -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program
./my_program -option1 < test_input1
./my_program -option2 < test_input2
gcc -fprofile-use -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program
-fprofile-use enables -funroll-loops, since it has enough information to decide when to actually unroll. Unrolling loops all over the place can make things worse. However, it's worth trying -funroll-loops to see if it helps.
If your test runs don't cover all the code paths, then some important ones will be marked as "cold" and optimized less.
-O3 enables auto-vectorization, which -O2 doesn't. This can give big speedups
-fwhole-program allows cross-file inlining, but only works when you put all the source files on one gcc command-line. -flto is another way to get the same effect. (Link-Time Optimization). clang supports -flto but not -fwhole-program.
-fomit-frame-pointer has been the default for a while now for x86-64, and more recently for x86 (32bit).
As well as gcc, try compiling your program with clang. Clang sometimes makes better code than gcc, sometimes worse. Try both and benchmark.
The flag -std=c99 does not change the optimization levels. It only changes what target language standard you want the compiler to confirm to.
You use -std=c99 when you want your program to be treated as a C99 program by the compiler.
The only flag that has to do with optimization among those you specified is -O3. Others serve for other purposes.
You may want to add -funroll-loops and -fomit-frame-pointer, but they should be already included in -O3.

Resources