-Og is a relatively new optimization option that is intended to improve the debugging experience while apply optimizations. If a user selects -Og, then I'd like my source files to activate alternate code paths to enhance the debugging experience. GCC offers the __OPTIMIZE__ preprocessor macro, but its only set to 1 when optimizations are in effect.
Is there a way to learn the optimization level, like -O1, -O3 or -Og, for use with the preprocessor?
I don't know if this is clever hack, but it is a hack.
$ gcc -Xpreprocessor -dM -E - < /dev/null > 1
$ gcc -Xpreprocessor -dM -O -E - < /dev/null > 2
$ diff 1 2
53a54
> #define __OPTIMIZE__ 1
68a70
> #define _FORTIFY_SOURCE 2
154d155
< #define __NO_INLINE__ 1
clang didn't produce the FORTIFY one.
I believe this is not possible to know directly the optimization level used to compile the software as this is not in the list of defined preprocessor symbols
You could rely on -DNDEBUG (no debug) which is used to disable assertions in release code and enable your "debug" code path in this case.
However, I believe a better thing to do is having a system wide set of symbols local to your project and let the user choose what to use explicitly.:
MYPROJECT_DNDEBUG
MYPROJECT_OPTIMIZE
MYPROJECT_OPTIMIZE_AGGRESSIVELY
This makes debugging or the differences of behavior between release/debug much easier as you can incrementally turn on/off the different behaviors.
Some system-specific preprocessor macros exist, depending on your target. For example, the Microchip-specific XC16 variant of gcc (currently based on gcc 4.5.1) has the preprocessor macro __OPTIMIZATION_LEVEL__, which takes on values 0, 1, 2, s, or 3.
Note that overriding optimization for a specific routine, e.g. with __attribute__((optimize(0))), does not change the value of __OPTIMIZE__ or __OPTIMIZATION_LEVEL__ within that routine.
Related
-Og is a relatively new optimization option that is intended to improve the debugging experience while apply optimizations. If a user selects -Og, then I'd like my source files to activate alternate code paths to enhance the debugging experience. GCC offers the __OPTIMIZE__ preprocessor macro, but its only set to 1 when optimizations are in effect.
Is there a way to learn the optimization level, like -O1, -O3 or -Og, for use with the preprocessor?
I don't know if this is clever hack, but it is a hack.
$ gcc -Xpreprocessor -dM -E - < /dev/null > 1
$ gcc -Xpreprocessor -dM -O -E - < /dev/null > 2
$ diff 1 2
53a54
> #define __OPTIMIZE__ 1
68a70
> #define _FORTIFY_SOURCE 2
154d155
< #define __NO_INLINE__ 1
clang didn't produce the FORTIFY one.
I believe this is not possible to know directly the optimization level used to compile the software as this is not in the list of defined preprocessor symbols
You could rely on -DNDEBUG (no debug) which is used to disable assertions in release code and enable your "debug" code path in this case.
However, I believe a better thing to do is having a system wide set of symbols local to your project and let the user choose what to use explicitly.:
MYPROJECT_DNDEBUG
MYPROJECT_OPTIMIZE
MYPROJECT_OPTIMIZE_AGGRESSIVELY
This makes debugging or the differences of behavior between release/debug much easier as you can incrementally turn on/off the different behaviors.
Some system-specific preprocessor macros exist, depending on your target. For example, the Microchip-specific XC16 variant of gcc (currently based on gcc 4.5.1) has the preprocessor macro __OPTIMIZATION_LEVEL__, which takes on values 0, 1, 2, s, or 3.
Note that overriding optimization for a specific routine, e.g. with __attribute__((optimize(0))), does not change the value of __OPTIMIZE__ or __OPTIMIZATION_LEVEL__ within that routine.
I have a code like this in mylib.h, and then I use it to create mylib.so.
Is there a way to check how MY_MACROS is defined in .so?
#ifdef SWITCH_CONDITION
#define MY_MACROS 0
#else
#define MY_MACROS 1
#endif
If that would be a function, I'd simply do
nm mylib.so | grep myfunction
Is there a way to do the same for macros?
P.S. There should be because of
> grep MY_MACROS mylib.so
> Binary file mylib.so matches
In general there is no way to do this sort of thing for macros. (But see more below.)
Preprocessor macros are theoretically a compile-time concept. In fact, in the early implementations of C, the preprocessor was -- literally -- a separate program, running in a separate process, converting C code with #include and #define and #ifdef into C code without them. The actual C compiler saw only the "preprocessed" code.
Now, theoretically a compiler could somehow save away some record of macro definitions, perhaps to aid in debugging. I wasn't aware of any that did this, although evidently those using the DWARF format actually do! See comments below, and this answer.
You can always write your own, explicit code to track the definition of certain macros. For example, I've often written code elong the lines of
void print_version()
{
printf("myprogram version %s", VERSION_STRING);
#ifdef DEBUG
printf(" (debug version)");
#endif
printf("\n");
}
Some projects have rather elaborate mechanisms to keep track of the compilation switches which are in effect for a particular build. For example, in projects managed by a configure script, there's often a single file config.status containing one single record of all the compilation options, for posterity.
Yes, but it requires debugging info.
You can compile your code with -g3:
$ gcc -g3 -shared -fPIC test.c -o test.so
and then run strings on the resulting binary:
$ strings test.so
...
__DEC32_EPSILON__ 1E-6DF
MY_MACROS 1
__UINT_LEAST32_TYPE__ unsigned int
Is there a way that GCC does not optimize any function calls?
In the generated assembly code, the printf function is replaced by putchar. This happens even with the default -O0 minimal optimization flag.
#include <stdio.h>
int main(void) {
printf("a");
return 0;
}
(Godbolt is showing GCC 9 doing it, and Clang 8 keeping it unchanged.)
Use -fno-builtin to disable all replacement and inlining of standard C functions with equivalents. (This is very bad for performance in code that assumes memcpy(x,y, 4) will compile to just an unaligned/aliasing-safe load, not a function call. And disables constant-propagation such as strlen of string literals. So normally you'd want to avoid that for practical use.)
Or use -fno-builtin-FUNCNAME for a specific function, like -fno-builtin-printf.
By default, some commonly-used standard C functions are handled as builtin functions, similar to __builtin_popcount. The handler for printf replaces it with putchar or puts
if possible.
6.59 Other Built-in Functions Provided by GCC
The implementation details of a C statement like printf("a") are not considered a visible side effect by default, so they aren't something that get preserved. You can still set a breakpoint at the call site and step into the function (at least in assembly, or in source mode if you have debug symbols installed).
To disable other kinds of optimizations for a single function, see __attribute__((optimize(0))) on a function or #pragma GCC optimize. But beware:
The optimize attribute should be used for debugging purposes only. It is not suitable in production code.
You can't disable all optimizations. Some optimization is inherent in the way GCC transforms through an internal representation on the way to assembly. See Disable all optimization options in GCC.
E.g., even at -O0, GCC will optimize x / 10 to a multiplicative inverse.
It still stores everything to memory between C statements (for consistent debugging; that's what -O0 really means); GCC doesn't have a "fully dumb" mode that tries to transliterate C to assembly as naively as possible. Use tcc for that. Clang and ICC with -O0 are somewhat more literal than GCC, and so is MSVC debug mode.
Note that -g never has any effect on code generation, only on the metadata emitted. GCC uses other options (mostly -O, -f*, and -m*) to control code generation, so you can always safely enable -g without hurting performance, other than a larger binary. It's not debug mode (that's -O0); it's just debug symbols.
I am trying to learn preprocessor tricks that I found not so easy (Can we have recursive macros?, Is there a way to use C++ preprocessor stringification on variadic macro arguments?, C++ preprocessor __VA_ARGS__ number of arguments, Variadic macro trick, ...). I know the -E option to see the result of the preprocessor whole pass but I would like to know, if options or means exist to see the result step by step. Indeed, sometimes it is difficult to follow what happens when a macro calls a macro that calls a macro ... with the mechanism of disabling context, painting blue ... In brief, I wonder if a sort of preprocessor debugger with breakpoints and other tools exists.
(Do not answer that this use of preprocessor directives is dangerous, ugly, horrible, not good practices in C, produces unreadable code ... I am aware of that and it is not the question).
Yes, this tool exists as a feature of Eclipse IDE. I think the default way to access the feature is to hover over a macro you want to see expanded (this will show the full expansion) and then press F2 on your keyboard (a popup appears that allows you to step through each expansion).
When I used this tool to learn more about macros it was very helpful. With just a little practice, you won't need it anymore.
In case anyone is confused about how to use this feature, I found a tutorial on the Eclipse documentation here.
This answer to another question is relevant.
When you do weird preprocessor tricks (which are legitimate) it is useful to ask the compiler to generate the preprocessed form (e.g. with gcc -C -E if using GCC) and look into that preprocessed form.
In practice, for a source file foo.c it makes (sometimes) sense to get its preprocessed form foo.i with gcc -C -E foo.c > foo.i and look into that foo.i.
Sometimes, it even makes sense to get that foo.i without line information. The trick here (removing line information contained in lines starting with #) would be to do:
gcc -C -E foo.c | grep -v '^#' > foo.i
Then you could indent foo.i and compile it, e.g. with gcc -Wall -c foo.i; you'll get error locations in the preprocessed file and you could understand how you got that and go back to your preprocessor macros (or their invocations).
Remember that the C preprocessor is mostly a textual transformation working at the file level. It is not possible to macro-expand a few lines in isolation (because prior lines might have played with #if combined with #define -perhaps in prior #include-d files- or preprocessor options such as -DNDEBUG passed to gcc or g++). On Linux see also feature_test_macros(7)
A known example of expansion which works differently when compiled with or without -DNDEBUG passed to the compiler is assert. The meaning of assert(i++ > 0) (a very wrong thing to code) depends on it and illustrates that macro-expansion cannot be done locally (and you might imagine some prior header having #define NDEBUG 1 even if of course it is poor taste).
Another example (very common actually) where the macro expansion is context dependent is any macro using __LINE__ or __COUNTER__
...
NB. You don't need Eclipse for all that, just a good enough source code editor (my preference is emacs but that is a matter of taste): for the preprocessing task you can use your compiler.
The only way to see what is wrong with your macro is to add the option which will keep the temporary files when compilation completes. For gcc it is -save-temps option. You can open the .i file and the the expanded macros.
IDE indexers (like Eclipse) will not help too much. They will not expand (as other answer states) the macros until the error occures.
How many GCC optimization levels are there?
I tried gcc -O1, gcc -O2, gcc -O3, and gcc -O4
If I use a really large number, it won't work.
However, I have tried
gcc -O100
and it compiled.
How many optimization levels are there?
To be pedantic, there are 8 different valid -O options you can give to gcc, though there are some that mean the same thing.
The original version of this answer stated there were 7 options. GCC has since added -Og to bring the total to 8.
From the man page:
-O (Same as -O1)
-O0 (do no optimization, the default if no optimization level is specified)
-O1 (optimize minimally)
-O2 (optimize more)
-O3 (optimize even more)
-Ofast (optimize very aggressively to the point of breaking standard compliance)
-Og (Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the
optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization
while maintaining fast compilation and a good debugging experience.)
-Os (Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations
designed to reduce code size.
-Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version)
There may also be platform specific optimizations, as #pauldoo notes, OS X has -Oz.
Let's interpret the source code of GCC 5.1
We'll try to understand what happens on -O100, since it is not clear on the man page.
We shall conclude that:
anything above -O3 up to INT_MAX is the same as -O3, but that could easily change in the future, so don't rely on it.
GCC 5.1 runs undefined behavior if you enter integers larger than INT_MAX.
the argument can only have digits, or it fails gracefully. In particular, this excludes negative integers like -O-1
Focus on subprograms
First remember that GCC is just a front-end for cpp, as, cc1, collect2. A quick ./XXX --help says that only collect2 and cc1 take -O, so let's focus on them.
And:
gcc -v -O100 main.c |& grep 100
gives:
COLLECT_GCC_OPTIONS='-O100' '-v' '-mtune=generic' '-march=x86-64'
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/cc1 [[noise]] hello_world.c -O100 -o /tmp/ccetECB5.
so -O was forwarded to both cc1 and collect2.
O in common.opt
common.opt is a GCC specific CLI option description format described in the internals documentation and translated to C by opth-gen.awk and optc-gen.awk.
It contains the following interesting lines:
O
Common JoinedOrMissing Optimization
-O<number> Set optimization level to <number>
Os
Common Optimization
Optimize for space rather than speed
Ofast
Common Optimization
Optimize for speed disregarding exact standards compliance
Og
Common Optimization
Optimize for debugging experience rather than speed or size
which specify all the O options. Note how -O<n> is in a separate family from the other Os, Ofast and Og.
When we build, this generates a options.h file that contains:
OPT_O = 139, /* -O */
OPT_Ofast = 140, /* -Ofast */
OPT_Og = 141, /* -Og */
OPT_Os = 142, /* -Os */
As a bonus, while we are grepping for \bO\n inside common.opt we notice the lines:
-optimize
Common Alias(O)
which teaches us that --optimize (double dash because it starts with a dash -optimize on the .opt file) is an undocumented alias for -O which can be used as --optimize=3!
Where OPT_O is used
Now we grep:
git grep -E '\bOPT_O\b'
which points us to two files:
opts.c
lto-wrapper.c
Let's first track down opts.c
opts.c:default_options_optimization
All opts.c usages happen inside: default_options_optimization.
We grep backtrack to see who calls this function, and we see that the only code path is:
main.c:main
toplev.c:toplev::main
opts-global.c:decode_opts
opts.c:default_options_optimization
and main.c is the entry point of cc1. Good!
The first part of this function:
does integral_argument which calls atoi on the string corresponding to OPT_O to parse the input argument
stores the value inside opts->x_optimize where opts is a struct gcc_opts.
struct gcc_opts
After grepping in vain, we notice that this struct is also generated at options.h:
struct gcc_options {
int x_optimize;
[...]
}
where x_optimize comes from the lines:
Variable
int optimize
present in common.opt, and that options.c:
struct gcc_options global_options;
so we guess that this is what contains the entire configuration global state, and int x_optimize is the optimization value.
255 is an internal maximum
in opts.c:integral_argument, atoi is applied to the input argument, so INT_MAX is an upper bound. And if you put anything larger, it seem that GCC runs C undefined behaviour. Ouch?
integral_argument also thinly wraps atoi and rejects the argument if any character is not a digit. So negative values fail gracefully.
Back to opts.c:default_options_optimization, we see the line:
if ((unsigned int) opts->x_optimize > 255)
opts->x_optimize = 255;
so that the optimization level is truncated to 255. While reading opth-gen.awk I had come across:
# All of the optimization switches gathered together so they can be saved and restored.
# This will allow attribute((cold)) to turn on space optimization.
and on the generated options.h:
struct GTY(()) cl_optimization
{
unsigned char x_optimize;
which explains why the truncation: the options must also be forwarded to cl_optimization, which uses a char to save space. So 255 is an internal maximum actually.
opts.c:maybe_default_options
Back to opts.c:default_options_optimization, we come across maybe_default_options which sounds interesting. We enter it, and then maybe_default_option where we reach a big switch:
switch (default_opt->levels)
{
[...]
case OPT_LEVELS_1_PLUS:
enabled = (level >= 1);
break;
[...]
case OPT_LEVELS_3_PLUS:
enabled = (level >= 3);
break;
There are no >= 4 checks, which indicates that 3 is the largest possible.
Then we search for the definition of OPT_LEVELS_3_PLUS in common-target.h:
enum opt_levels
{
OPT_LEVELS_NONE, /* No levels (mark end of array). */
OPT_LEVELS_ALL, /* All levels (used by targets to disable options
enabled in target-independent code). */
OPT_LEVELS_0_ONLY, /* -O0 only. */
OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og. */
OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og. */
OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og. */
OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os. */
OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og. */
OPT_LEVELS_3_PLUS, /* -O3 and above. */
OPT_LEVELS_3_PLUS_AND_SIZE, /* -O3 and above and -Os. */
OPT_LEVELS_SIZE, /* -Os only. */
OPT_LEVELS_FAST /* -Ofast only. */
};
Ha! This is a strong indicator that there are only 3 levels.
opts.c:default_options_table
opt_levels is so interesting, that we grep OPT_LEVELS_3_PLUS, and come across opts.c:default_options_table:
static const struct default_options default_options_table[] = {
/* -O1 optimizations. */
{ OPT_LEVELS_1_PLUS, OPT_fdefer_pop, NULL, 1 },
[...]
/* -O3 optimizations. */
{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
[...]
}
so this is where the -On to specific optimization mapping mentioned in the docs is encoded. Nice!
Assure that there are no more uses for x_optimize
The main usage of x_optimize was to set other specific optimization options like -fdefer_pop as documented on the man page. Are there any more?
We grep, and find a few more. The number is small, and upon manual inspection we see that every usage only does at most a x_optimize >= 3, so our conclusion holds.
lto-wrapper.c
Now we go for the second occurrence of OPT_O, which was in lto-wrapper.c.
LTO means Link Time Optimization, which as the name suggests is going to need an -O option, and will be linked to collec2 (which is basically a linker).
In fact, the first line of lto-wrapper.c says:
/* Wrapper to call lto. Used by collect2 and the linker plugin.
In this file, the OPT_O occurrences seems to only normalize the value of O to pass it forward, so we should be fine.
Seven distinct levels:
-O0 (default): No optimization.
-O or -O1 (same thing): Optimize, but do not spend too much time.
-O2: Optimize more aggressively
-O3: Optimize most aggressively
-Ofast: Equivalent to -O3 -ffast-math. -ffast-math triggers non-standards-compliant floating point optimizations. This allows the compiler to pretend that floating point numbers are infinitely precise, and that algebra on them follows the standard rules of real number algebra. It also tells the compiler to tell the hardware to flush denormals to zero and treat denormals as zero, at least on some processors, including x86 and x86-64. Denormals trigger a slow path on many FPUs, and so treating them as zero (which does not trigger the slow path) can be a big performance win.
-Os: Optimize for code size. This can actually improve speed in some cases, due to better I-cache behavior.
-Og: Optimize, but do not interfere with debugging. This enables non-embarrassing performance for debug builds and is intended to replace -O0 for debug builds.
There are also other options that are not enabled by any of these, and must be enabled separately. It is also possible to use an optimization option, but disable specific flags enabled by this optimization.
For more information, see GCC website.
Four (0-3): See the GCC 4.4.2 manual. Anything higher is just -O3, but at some point you will overflow the variable size limit.