I found a lot of Optimization Options here
While going through them I found some of them have side-effects (like makes debugging impossible). In my experience I have found the -O1 to -O3 and -Os most commonly used. But, what are the other options which are commonly used in your projects?
-ffast-math can have a significant performance impact on floating point intensive software.
Also, compiling specific for the target processor using the appropriate -march= option may have a slight performance impact, but strictly speaking, this is not an optimizing option.
-march=native with recent versions of gcc removes all the headache of determining the platform on which you are compiling.
Related
I am using GCC's C compiler for ARM. I've compiled Newlib using the C compiler. I went into the makefile for Newlib and, saw that the Newlib library gets compiled using -g -O2.
When compiling my code and linking against Newlib's standard C library does this debug information get stripped?
You can use -g and -O2 both together. The compiler with optimize the code and keep the debugging information. Of course at some places because of code optimization you will not get information for some symbol that has been removed by code optimization and is no longer present.
From the Gcc options summary
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.
There are multiple flags and options that will make debugging impossible or difficult. e.g.
-fomit-frame-pointer .... It also makes debugging impossible on some machines.
-fsplit-wide-types.... This normally generates better code for those types, but may make debugging more difficult.
-fweb - ... It can, however, make debugging impossible, since variables no longer stay in a “home register”.
The first two are enabled for -O2.
If you want debugging information to be preserved, the following option can be used.
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
My debug CFLAGS were always -g -O0. The latter mainly to disable jumps to unexpected line while debugging. Nowadays more and more programs refuse to compile with -O0, besides, -D_FORTIFY_SOURCE require optimizer.
is it possible to compile with -O, but have predictable behavior in debugger?
If you're using GCC 4.8 or above, try using -g -Og. As explained in the release notes:
A new general optimization level, -Og, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of run-time performance. Overall experience for development should be better than the default optimization level -O0.
I am working on Nehalam/westmere Intel micro architecture CPU. I want to optimize my code for this Architecture. Are there any specialized compilation flags or C functions by GCC which will help me improve my code's run time performance?
I am already using -O3.
Language of the Code - C
Platform - Linux
GCC Version - 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
In my code I have some floating point comparison and they are done over a million time.
Please assume the code is already best optimized.
First, if you really want to profit from optimization on newer processors like this one, you should install the newest version of the compiler. 4.4 came out some years ago, and even if it still seems maintainted, I doubt that the newer optimization code is backported to that. (Current version is 4.7)
Gcc has a catch-all optimization flag that usually should produce code that is optimized for the compilation architecture: -march=native. Together with -O3 this should be all that you need.
Warning: the answer is incorrect.
You can actually analyze all disabled and enabled optimizations yourself. Run on your computer:
gcc -O3 -Q --help=optimizers | grep disabled
And then read about the flags that are still disabled and can according to the gcc documentation influence performance.
You'll want to add an -march=... option. The ... should be replaced with whatever is closest to your CPU architecture (there tend to be minor differences) described in the i386/x86_64 options for GCC here.
I would use core2 because corei7 (the one you'd want) is only available in GCC 4.6 and later. See the arch list for GCC 4.6 here.
If you really want to use a gcc so old that it doesn't support corei7, you could use -mtune=barcelona
Link-time optimization (LTO) (a.k.a. unity build) is included in GCC 4.5 or later and other compilers have similar optimization passes. Doesn't this make certain code patterns much more viable than before?
For example, for maximum performance a "module" of C code often needs to expose its guts. Does LTO make this obsolete? What code patterns are now viable that were not before?
I believe that LTO is simply an optimization, but not necessarily one that obviates the need for documentation of implemenation ("exposing the guts") of any module. Whole languages have been written to that effect; I do not think C will have that need removed from it soon, or perhaps ever.
From the description of the LTO feature in gcc:
Link Time Optimization (LTO) gives GCC the capability of dumping its
internal representation (GIMPLE) to disk, so that all the different
compilation units that make up a single executable can be optimized as
a single module. This expands the scope of inter-procedural
optimizations to encompass the whole program (or, rather, everything
that is visible at link time).
From the announcement of LTO's inclusion into gcc:
The result should, in principle, execute faster but our IPA cost
models are still not tweaked for LTO. We've seen speedups as well as
slowdowns in benchmarks (see the LTO testers at
http://gcc.opensuse.org/).
I'm using the standard gcc compiler in math software development with C-language. I don't know that much about compilers or compiler options, and I was just wondering, is it possible to make faster executables using another compiler or choosing better options? The default Makefile sets options -ffast-math and -O3 and I think both of them have some impact in the overall calculation time. My software is using memory quite extensively, so I imagine some options related to memory management might do the trick?
Any ideas?
Before experimenting with different compilers or random, arbitrary micro-optimisations, you really need to get a decent profiler and profile your code to find out exactly what the performance bottlenecks are. The actual picture may be very different from what you imagine it to be. Once you have a profile you can then start to consider what might be useful optimisations. E.g. changing compiler won't help you if you are limited by memory bandwidth.
Here are some tips about gcc performance:
do benchmarks with -Os, -O2 and -O3. Sometimes -O2 will be faster because it makes shorter code. Since you said that you use a lot of memory, try with -Os too and take measurements.
Also check out the -march=native option (it is considered safe to use, if you are making executable for computers with similar processors) on the client computer. Sometimes it can have considerable impact on performance. If you need to make a list of options gcc uses with native, here's how to do it:
Make a small C program called test.c, then
$ touch test.c
$ gcc -march=native -fverbose-asm -S test.c
$ cat test.s
credits for code goto Gentoo forums users.
It should print out a list of all optimizations gcc used. Please note that if you're using i7, gcc 4.5 will detect it as Atom, so you'll need to set -march and -mtune manually.
Also read this document, it will help you (still, in my experience on Gentoo, -march=native works better) http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
You could try with new options in late 4.4 and early 4.5 versions such as -flto and -fwhole-program. These should help with performance, but when experimenting with them, my system was unstable. In any case, read this document too, it will help you understand some of GCC's optimization options http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
If you are running Linux on x86 then typically the Intel or PGI compilers will give you significantly faster performing executables.
The downsides are that there are more knobs to tune and that they come with a hefty price tag!
If you have specific hardware you can target your code for, the (hardware) company often releases paid-for compilers optimized for that hardware.
For example:
xlc for AIX
CC for Solaris
These compilers will generally produce better code optimization-wise.
As you say your program is memory heavy you could test to use a different malloc implementation than the one in standard library on your platform.
For example you could try the jemalloc (http://www.canonware.com/jemalloc/).
Keep in mind they most improvements to be had by changing compilers or settings will only get you proportional speedups where as adjusting algorithms you can sometimes get improvements in the O() of your program. Be sure to exhaust that before you put to much work into tweaking settings.