I have written a program with AVX intrinsics, which works well using Ubuntu 12.4 LTS and GCC 4.6 with the following compilation line: g++ -g -Wall -mavx ProgramName.cc -o ProgramName
The problem started When i have updated the compiler up to 4.7 and 4.8.1 versions to support the 16-bit AVX2 intrinsics, which is not supported in gcc 4.6
Currently, the updated gcc version compiles both AVX and AVX2 programs properly. However, it gives me the following error when i run the program: Illegal instruction (core dumped), although it was working on gcc 4.6
My question is: what is prefect way to compile and run both AVX and AVX2 intrinsics
If you tell gcc to use AVX2, it will do so, regardless of whether your CPU supports them or not. That can be useful for cross-compiling or for examining gcc's code generation, but it's not particularly helpful for running programs. If your program crashes with an illegal instruction exception, it is most likely that your CPU does not support the AVX2 extension.
On i386 and x86-64 platforms (and in certain other circumstances), you can specify the gcc option -march=native to generate code for the host machines instruction code. The compiled code might not work on another machine with fewer capabilities, but it should allow you to use all the features of your machine.
While -march=native is a good solution for generating executables, it does not actually help much with writing code; you still need to tailor the instrinsics for the target's architecture, and writing code which can take advantage of CPU features without relying on them gets complicated. I don't know of a good C solution, but there are several C++ template frameworks available.
Upgrading to gcc 4.8 likely pulled in AVX512, so you would have needed to limit the generated instr mix to ONLY AVX2 for your machine.
Related
I searched a little bit and one google search was enough to discover the differences between the gcc and cc compilers, but I did not find the advantages in using one or another to compile C programs
Which compiler should I use? and why?
The compiler installed as part of X-Code on OS/X is a recent version of clang whose development is sponsored by Apple.
gcc is not provided nor supported by Apple.
Unless you install gcc explicitly from one of its distributions, gcc is an alias for clang on OS/X, just like cc.
The reason for this is to support packages that use gcc explicitly as the C compiler.
On your system, it does not matter which alias you use, the compiler invoked will be clang, which has a high degree of compatibility with gcc extensions but generates different code. Both are very advanced and dependable.
I’m trying to programmatically detect at runtime what would be the best GCC -march flag for a given CPU. The program I’m developing will download optimized binaries depending on the user's CPU architecture. So I must detect which architecture it is in this list:
core2
nehalem
westmere
sandybridge
ivybridge
haswell
broadwell
generic (is all above failed).
Like this command: gcc -march=native -Q --help=target | grep -- '-march=' | cut -f3, without invoking GCC of course (I can’t embed it). I found the function __builtin_cpu_is of GCC, but it does not support all architectures.
I’m looking for a portable way of doing so, at least for GNU/Linux, Windows and OSX. I’m only planning to use GCC as compiler.
I am getting the following error while trying to compile some code for an ARM Cortex-M4
using
gcc -mcpu=cortex-m4 arm.c
`-mcpu=' is deprecated. Use `-mtune=' or '-march=' instead.
arm.c:1: error: bad value (cortex-m4) for -mtune= switch
I was following GCC 4.7.1 ARM options. Not sure whether I am missing some critical option. Any kickstart for using GCC for ARM will also be really helpful.
As starblue implied in a comment, that error is because you're using a native compiler built for compiling for x86 CPUs, rather than a cross-compiler for compiling to ARM.
GCC only supports a single general architecture type in any given compiler binary -- so, although the same copy of GCC can compile for both 32-bit and 64-bit x86 machines, you can't compile to both x86 and ARM with the same copy of GCC -- you need an ARM-specific GCC.
(As auselen suggests, getting a pre-built one will save you quite a lot of work, even if you're only using it as a starting point to get things set up. You need to have GCC, binutils, and a C library as a minimum, and those are all separate open-source projects that the pre-built versions have already done the work of combining. I'll recommend Sourcery CodeBench Lite since that's the one my company makes and I do think it's a fairly good one.)
As the error message says -mcpu is deprecated, and you should use the other options stated. However "deprectated" simply means that its use may not continue to be supported; it will still work.
ARM Cortex-M4 is ARM Architecture V7E-M, so you should use -march=armv7-m (the documentation does not specifically list armv7e-m, but that may have been added since the documentation was last updated. The E is essentially the difference between M3 and M4 - the DSP instructions, so the compiler will not generate code that takes advantage of these instructions. Using ARM's Cortex-M DSP library is probably the best way to use these instructions to benefit your application. If your part has an FPU, then other options will be needed enable code generation for that.
Like others already pointed out, you are using a compiler for your host machine, and you need a compiler for generating code for your target processor instead (a cross compiler). Like #Brooks suggested, you can use a pre-built toolchain, but if you want to roll out your own cross-compiler, libc and binutils, there is a nice tool called Crosstool-NG. It greatly simplifies the process of building a cross-compiler optimized to generate code for a specific processor, so you're not stuck with a generic prebuilt toolchain, which usually builds code for a family of compatible processors (e.g. you could tune the toolchain for generating ASM for your specific target, or floating point code for a hardware FPU which is specific to your processor, instead of using only software floating point routines, which are default to most pre-built toolchains).
I am working on Nehalam/westmere Intel micro architecture CPU. I want to optimize my code for this Architecture. Are there any specialized compilation flags or C functions by GCC which will help me improve my code's run time performance?
I am already using -O3.
Language of the Code - C
Platform - Linux
GCC Version - 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
In my code I have some floating point comparison and they are done over a million time.
Please assume the code is already best optimized.
First, if you really want to profit from optimization on newer processors like this one, you should install the newest version of the compiler. 4.4 came out some years ago, and even if it still seems maintainted, I doubt that the newer optimization code is backported to that. (Current version is 4.7)
Gcc has a catch-all optimization flag that usually should produce code that is optimized for the compilation architecture: -march=native. Together with -O3 this should be all that you need.
Warning: the answer is incorrect.
You can actually analyze all disabled and enabled optimizations yourself. Run on your computer:
gcc -O3 -Q --help=optimizers | grep disabled
And then read about the flags that are still disabled and can according to the gcc documentation influence performance.
You'll want to add an -march=... option. The ... should be replaced with whatever is closest to your CPU architecture (there tend to be minor differences) described in the i386/x86_64 options for GCC here.
I would use core2 because corei7 (the one you'd want) is only available in GCC 4.6 and later. See the arch list for GCC 4.6 here.
If you really want to use a gcc so old that it doesn't support corei7, you could use -mtune=barcelona
Recently I've been playing around with cross compiling using GCC and discovered what seems to be a complicated area, tool-chains.
I don't quite understand this as I was under the impression GCC can create binary machine code for most of the common architectures, and all that else really matters is what libraries you link with and what type of executable is created.
Can GCC not do all these things itself? With a single build of GCC, all the appropriate libraries and the correct flags sent to GCC, could I produce a PE executable for a Windows x86 machine, then create an ELF executable for an embedded Linux MIPS device and finally an executable for an OSX PowerPC machine?
If not can someone explain how you would achieve this?
With a single build of GCC, all the
appropriate libraries and the correct
flags sent to GCC, could I produce a
PE executable for a Windows x86
machine, then create an ELF executable
for an embedded Linux MIPS device and
finally an executable for an OSX
PowerPC machine? If not can someone
explain how you would achieve this?
No. A single build of GCC produces object code for one target architecture. You would need a build targeting Intel x86, a build targeting MIPS, and a build targeting PowerPC. However, the compiler is not the only tool you need, despite the fact that you can build source code into an executable with a single invocation of GCC. Under the hood, it makes use of the assembler (as) and linker (ld) as well, and those need to be built for the target architecture and platform. Usually GCC uses the versions of these tools from the GNU binutils package, so you'd need to build that for the target platform too.
You can read more about building a cross-compiling toolchain here.
I don't quite understand this as I was
under the impression GCC can create
binary machine code for most of the
common architectures
This is true in the sense that the source code of GCC itself can be built into compilers that target various architectures, but you still require separate builds.
Regarding -march, this does not allow the same build of GCC to switch between platforms. Rather it's used to select the allowable instructions to use for the same family of processors. For example, some of the instructions supported by modern x86 processors weren't supported by the earliest x86 processors because they were introduced later on (such as extension instruction sets like MMX and SSE). When you pass -march, GCC enables all opcodes supported on that processor and its predecessors. To quote the GCC manual:
While picking a specific cpu-type will
schedule things appropriately for that
particular chip, the compiler will not
generate any code that does not run on
the i386 without the -march=cpu-type
option being used.
If you want to try cross-compiling, and don't want to build the toolchain yourself, I'd recommend looking at CodeSourcery. They have a GNU-based toolchain, and their free "Lite" version supports quite a few architectures. I've used it for Linux/ARM and Android/ARM.