I wonder if it's possible to make Intel C++ compiler (or other compilers such as gcc or clang) display some messages from optimizer. I would like to know what exactly optimizer did with my code. By default compiler prints only very basic things like unused variable. very simple example - I want to know that expression;
float x = 1.0f/2;
will be evaluated into:
float x = 0.5f;
and there will be no division in code (I know that in this case it's always true, but this is just an example). More advanced example could be loop unroll or operations reorder.
Thanks in advance.
For icc and icpc, you can use the -opt-report -opt-report-level max set of flags.
You can also specify an opt-report file. See here for more details
An optimizing compiler (like GCC, when asked to optimize with -O1 or -O2 etc...) is essentially transforming internal representations of your source code.
If you want to see some of the internal GCC representations, you could pass -fdump-tree-all to GCC. Beware, you'll get hundreds of dump files.
You could also use the MELT probe: MELT is a domain specific language (and plugin implementation) to extend GCC, and it has a probe mode to interactively show some of the internal (notably Gimple) representations.
The optimization you describe at the top of the post is (somewhat strangely) part of icc -fno-prec-div (which is a default which you might be overriding).
Related
Many questions about forcing the order of functions in a binary to match the order of the source file
For example, this post, that post and others
I can't understand why would gcc want to change their order in the first place?
What could be gained from that?
Moreover, why is toplevel-reorder default value is true?
GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that.
There is no obligation for GCC to compile a C function into a single function in the sense of the ELF format. See elf(5) on Linux
In practice (with optimizations enabled: try compiling foo.c with gcc -Wall -fverbose-asm -O3 foo.c then look into the emitted foo.s assembler file), the GCC compiler is building intermediate representations like GIMPLE. A big lot of optimizations are transforming GIMPLE to better GIMPLE.
Once the GIMPLE representation is "good enough", the compiler is transforming it to RTL
On Linux systems, you could use dladdr(3) to find the nearest ELF function to a given address. You can also use backtrace(3) to inspect your call stack at runtime.
GCC can even remove functions entirely, in particular static functions whose calls would be inline expanded (even without any inline keyword).
I tend to believe that if you compile and link your entire program with gcc -O3 -flto -fwhole-program some non static but unused functions can be removed too....
And you can always write your own GCC plugin to change the order of functions.
If you want to guess how GCC works: download and study its source code (since it is free software) and compile it on your machine, invoke it with GCC developer options, ask questions on GCC mailing lists...
See also the bismon static source code analyzer (some work in progress which could interest you), and the DECODER project. You can contact me by email about both. You could also contribute to RefPerSys and use it to generate GCC plugins (in C++ form).
What could be gained from that?
Optimization. If the compiler thinks some code is like to be used a lot it may put that code in a different region than code which is not expected to execute often (or is an error path, where performance is not as important). And code which is likely to execute after or temporally near some other code should be placed nearby, so it is more likely to be in cache when needed.
__attribute__((hot)) and __attribute__((cold)) exist for some of the same reasons.
why is toplevel-reorder default value is true?
Because 99% of developers are not bothered by this default, and it makes programs faster. The 1% of developers who need to care about ordering use the attributes, profile-guided optimization or other features which are likely to conflict with no-toplevel-reorder anyway.
In the following link there is a section for non-simd intel intrinsics:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
These include assembly instructions like bsf and bsr. For SIMD instructions I can copy the c function and run it after including the proper header.
For the non-simd functions, like _bit_scan_reverse (bsr), I get that this function is undefined for gcc (implicit definition). GCC has similar "builtin functions" e.g. __builtin_ctz, but no _bit_scan_reverse or _mm_popcnt_u32. Why are these intrinsics not available?
#include <stdio.h>
#include <immintrin.h>
int main(void) {
int x = 5;
int y = _bit_scan_reverse (x);
printf("%d\n",y);
return 0;
}
It appears that I needed to have two changes:
First, it appears to be best practice to include x86intrin.h rather than more specific includes. This appears to be compiler specific and is covered in much better detail in:
Header files for x86 SIMD intrinsics
Importantly, you would have a different include if not using gcc.
Second, compiler options also need to be enabled. For gcc these are detailed in:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
Although documentation for many flags are lacking.
As my goal is to distribute a compiled binary, I wanted to try and avoid -march=native
Most of the "other" intrinsics I'm interested in are bit manipulation related.
Ye Olde Wikipedia has a decent writeup of important bit manipulation intrinsic groups like bmi2:
https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
I need bmi2 for BZHI (instruction) or _bzhi_u32 (c)
Thus I can get what I want with something like:
-mavx2 -mbmi2
Using -mbmi2 seems to be sufficient to get things like bmi1 and abm (see linked Wikipedia page for definitions) although I don't see any mention of this in the linked gcc page so I might be wrong about this ... EDIT: It seems like adding bmi2 support does not add bmi1 and abm, I might have been using a __builtin call.... I later needed to add -mabm and -mbmi explicitly to get the instructions I wanted. As Peter Cordes suggested it is probably better to target Haswell -march=haswell as a starting point and then add on additional flags as needed. Haswell is the first processor with AVX2 from 2013 so in my mind -march=haswell is basically saying, I expect that you have a computer from 2013 or newer.
Also, based on some quick reading, it sounds like the use of __builtin enables the necessary flags (a future question for SO), although there does not appear to be a 1:1 correspondence between intrinsics and builtins. More specifically, not all intrinsics seem to be included as builtins, meaning the flag setting approach seems to be necessary, rather than just always using builtins and not worrying about setting flags. Also it is useful to know what intrinsics are being used, for distribution purposes, as it seems like bmi2 could still be missing on a substantial portion of computers (e.g. needing AMD from 2015+ - I think).
It's still not clear to me why just using the specified include in the Intel documentation doesn't work, but this info get's me 99% of the way to where I want to be.
I studied Option Summary for gfortran but found no compiler option to detect integer overflow. Then I found the GCC (GNU Compiler Collection) flag option -fsanitize=signed-integer-overflow here and used it when invoking gfortran. It works--integer overflow can be detected at run time!
So what does -fsanitize=signed-integer-overflow do here? Just adding to the machine code generated by gfortran some machine-level pieces that check integer overflow?
What is the relation between GCC (GNU Compiler Collection) flag options and gfortran compiler options ? What gcc compiler options can I use for gfortran, g++ etc ?
There is the GCC - GNU Compiler Collection. It shares the common backend and middleend and has frontends for different languages. For example frontends for C, C++ and Fortran which are usually invoked by commands gcc, g++ and gfortran.
It is actually more complicated, you can call gcc on a Fortran source and gfortran on a C source and it will work almost the same with the exceptions of libraries being linked (there are some other fine points). The appropriate frontend will be called based on the file extension or the language requested.
You can look almost all GCC (not just gcc) flags for all of the mentioned frontends. There are certain flags which are language specific. Normally you will get a warning like
gfortran -fcheck=all source.c
cc1: warning: command line option ‘-fcheck=all’ is valid for Fortran but not for C
but the file will compile fine, the option is just ignored and you will get a warning about that. Notice it is a C file and it is compiled by the gfortran command just fine.
The sanitization options are AFAIK not that language specific and work for multiple languages implemented in GCC, maybe with some exceptions for some obviously language specific checks. Especially -fsanitize=signed-integer-overflow which you ask about works perfectly fine for both C and C++. Signed integer overwlow is undefined behaviour in C and C++ and it is not allowed by the Fortran standard (which effectively means the same, Fortran just uses different words).
This isn't a terribly precise answer to your question, but an aha! moment, when learning about compilers, is learning that gcc (the GNU Compiler Collection), like llvm, is an example of a three-stage compiler.
The ‘front end’ parses the syntax of whichever language you're interested, and spits out an Abstract Syntax Tree (AST), which represents your program in a language-independent way.
Then the ‘middle end’ (terrible name, but ‘the clever bit’) reorganises that AST into another AST which is semantically equivalent but easier to turn into machine code.
Then the ‘back end’ turns that reorganised AST into assembler for one-or-other processor, possibly doing platform-specific micro-optimisations along the way.
That's why the (huge number of) gcc/llvm options are unexpectedly common to (apparently wildly) different languages. A few of the options are specific to C, or Fortran, or Objective-C, or whatever, but the majority of them (probably) are concerned with the middle and last bits, and so are common to all of the languages that gcc/llvm supports.
Thus the various options are specific to stage 1, 2 or 3, but may not be conveniently labelled as such; with this in mind, however, you might reasonably intuit what is and isn't relevant to the particular language you're interested in.
(It's for this sort of reason that I will dogmatically claim that CC++FortranJavaPerlPython is essentially a single language, with only trivial syntactical and library minutiae to distinguish between dialects).
Is it OK to say that 'volatile' keyword makes no difference if the compiler optimization is turned off i.e (gcc -o0 ....)?
I had made some sample 'C' program and seeing the difference between volatile and non-volatile in the generated assembly code only when the compiler optimization is turned on i.e ((gcc -o1 ....).
No, there is no basis for making such a statement.
volatile has specific semantics that are spelled out in the standard. You are asserting that gcc -O0 always generates code such that every variable -- volatile or not -- conforms to those semantics. This is not guaranteed; even if it happens to be the case for a particular program and a particular version of gcc, it could well change when, for example, you upgrade your compiler.
Probably volatile does not make much difference with gcc -O0 -for GCC 4.7 or earlier. However, this is probably changing in the next version of GCC (i.e. future 4.8, that is current trunk). And the next version will also provide -Og to get debug-friendly optimization.
In GCC 4.7 and earlier no optimizations mean that values are not always kept in registers from one C (or even Gimple, that is the internal representation inside GCC) instruction to the next.
Also, volatile has a specific meaning, both for standard conforming compilers and for human. For instance, I would be upset if reading some code with a sig_atomic_t variable which is not volatile!
BTW, you could use the -fdump-tree-all option to GCC to get a lot of dump files, or use the MELT domain specific language and plugin, notably its probe to query the GCC internal representations thru a graphical interface.
Gcc's -fdump-tree-optimized option dumps an optimized version of your C code as a C file. Is there a way I can do the same using intel's icc compiler?
I have a matrix multiplication code that I have compiled as icc -O3 -ipo mult.c. I want to view how the compiler has performed optimizations. If nothing works, then I shall generate the assembly code for the program.
Technically, -fdump-tree-optimized don't dump a C representation, but a textual partial representation of the Gimple code used inside GCC (Gimple is the middle-end internal representation of instructions, on which most GCC target-independent optimization passes operate).
But icc is a proprietary compiler (a black box), so from the point of view of its provider, it is not interesting (for Intel) to show how icc works.
GCC has the ability to show its internal representations, because it is a free software. Proprietary compilers don't want to show how they work.
If this is a class, you could perhaps try also LLVM. (But I don't know how do dump internal representations inside).
And more importantly, if this is a class, you might suggest your student using GCC 4.6 to develop a plugin or a GCC MELT extension to explore and experiment optimizations.
MELT is a high-level domain specific language to extend GCC and it provides many features to ease such tasks.