I know that it's generally recommended to use -march=native (if you're compiling for the machine you're on), so that gcc determines your arch and cputype and generates the most machine-specific code, but how does it do that?
Does it use cpuid (on either arm or x86)? What techniques are used on platforms without a cpuid-like instruction?
Good question.
My intuition was that it would go check /proc/cpuinfo. Actually it depends on the architecture it has been compiled to run on. It seems like host_detect_local_cpu is the function responsible for that. Its job is to replace -march=native by either the good -march=<...> or a set of flags (-mmmx, -mno-avx, etc) that match as closely as possible the current cpu. Examples for i386 and arm. i386 uses cpuid directly to check for each and every possible feature. arm checks in /proc/cpuinfo for the CPU part line and has a table that maps from a CPU part value to a generation of the architecture and use this directly in -march=<...>.
Just for fun I check on other architectures (I am not familiar with them).
solaris on sparc: uses the kstat interface
linux on sparc: uses /proc/cpuinfo
alpha: uses the implver instruction
darwin on rs6000: uses the hw.cpusubtype system call
freebsd on rs6000: uses hardcoded powerpc
linux on rs6000: checks in /proc/self/auxv for the platform value in the elf interpreter of its own process
aix on rs6000: uses _system_configuration (apparently a global structure)
Related
I am writing real mode function, which should be normal function with stackframes and so, but it should use %sp instead of %esp. Is there some way to do it?
GCC 5.2.0 (and possible earlier versions) support 16-bit code generation with the -m16 flag. However, the code will almost certainly rely on 32-bit processor features (such as 32-bit wide registers), so you should check the generated assembly carefully.
From the man pages:
The -m16 option is the same as -m32, except for that it outputs the
".code16gcc" assembly directive at the beginning of the assembly output
so that the binary can run in 16-bit mode.
Firstly, gcc could build 16bit code, because the linux kernel is go through realmode to protectmode, so it could even build 16bit c code.
Then, -m16 option is supported by GCC >= 4.9 and clang >= 3.5
gcc will ignore asm(".code16"),you can see it by -S output the assembly code surround by #APP #NO_APP
the linux kernel do the trick to compile 16bit c with a code16gcc.h(only have .code16gcc) pass to gcc compile params directly.
see Build 16-bit code with -m16 where possible, also see the linux kernel build Makefile
if you direct put the asm(".code16gcc"), see Writing 16-bit Code, it's not real 16bit code, call, ret, enter, leave, push, pop, pusha, popa, pushf, and popf instructions default to 32-bit size
GCC does not produce 8086 code. The GNU AS directive .code16gcc can be used to assemble the output of GCC to run in a 16-bit mode, put asm(".code16gcc") at the start of your C source, your program will be limited to 64Kibytes.
On modern GCC versions you can pass the -m16 argument to gcc which will produce code to run in a 16-bit mode. It still requires a 386 or later.
As far as I know, GCC does not support generation of code for 16-bit x86. For legacy bootloaders and similar purposes, you should write a small stub in assembly language to put the cpu in 32-bit mode and pass off execution to 32-bit code. For other purposes you really shouldn't be writing 16-bit code.
I'm writing a program using Intel intrinsics. I want to use _mm_permute_pd intrinsic, which is only available on CPUs with AVX. For CPUs without AVX I can use _mm_shuffle_pd but according to the specs it is much slower than _mm_permute_pd. Do the header files for Intel intrinsics define constants that allow me to distinguish whether AVX is supported so that I can write sth like this:
#ifdef __IS_AVX_SUPPORTED__ // is there sth like this defined?
// use _mm_permute_pd
# else
// use _mm_shuffle_pd
#endif
? I have found this tutorial, which shows how to perform a runtime check but I need to do a static, compile-time check for the current machine.
GCC, ICC, MSVC, and Clang all define a macro __AVX__ which you can check. In fact it's the only SIMD constant defined by all those compilers (MSVC is the one that breaks the mold). This only tells you if your code was compiled with AVX support (e.g. -mavx with GCC or /arch:AVX with MSVC) it does not tell you if your CPU supports AVX. If you want to know if the CPU supports AVX you need to check CPUID. Here, asm-in-c-error, is an example to read CPUID from all those compilers.
To do this properly I suggest you make a CPU dispatcher.
Edit: In case anyone wants to know how to use the values from CPUID to find out if AVX is available see https://github.com/Mysticial/FeatureDetector
I assume you are using Intel C++ Compiler. In this case - yes, there are such macros: Intel C++ Compiler Reference Guide: __AVX__, __AVX2__.
P.S. Be aware that if you compile you application with AVX instruction set enabled it will fail on CPUs not supporting AVX. If you are going to distribute your software as source code package and compile on target machine - this is may be a viable solution. Otherwise you should check for AVX dynamically.
P.P.S. There are several options for ICC. Take a look at the following compiler options and also references from it to other.
It seems to me that the only way is to compile and run a program that identifies whether AVX is available. Then manually or automatically compile separate code with or without AVX functions. For VS 2013, I would used my code in commomAVX folder in the following to identify hasAVX (or not) and use this to execute one of two different BAT files to compile and link the appropriate program.
http://www.roylongbottom.org.uk/gigaflops-benchmarks.zip
My question was to help to identify a solution regarding the use of suitable compile options such as /arch:AVX.
I have rootfs and klibc file systems. I am creating make rules and some developers have an older compiler without inter-networking.note1 I am trying to verify that all the files get built with arm only when a certain version of the compiler is detected. I have re-built the tree's several times. I was using readelf -A and looking for Tag_THUMB_ISA_use: Thumb-1, but this seem to be in arm only code (but was built with the interworking compiler) as well as thumb code. I can manually run objdump -S and examine the assembler to determine what instruction set is in use.
However, it would be much easier if I had a script/tool predicate so that find, etc can be used to search through the shadow file systems to look for binaries that may have been missed. I thought that some of this information would be in the ELF header and accessible via objdump or readelf, but I haven't found anything reliable.
Specifically I am looking for,
Compiled 'C' that wouldn't run without a CONFIG_ARM_THUMB Linux system.
make rules that use 'C' compiler flags that choke a non-thumb compilers.
note1: Interworking allow easy switching between thumb and arm modes, and the compiler will automatically generate code to support calling from either mode.
The readelf -A output doesn't describe the elf contents. It just describes the capabilities of the processor and or system that is expected or fed to the compiler. As I have an ARM926 CPU which is an ARMV5TEJ processor, gcc/ld will always set Tag_THUMB_ISA_use: Thumb-1 as it just means that ARMV5TEJ is recognized as being Thumb-1 capable. It says nothing about the code itself.
Examining the Linux arch/arm/kernel/elf.c routine elf_check_arch() shows a check for x->e_entry & 1. This leads to the following script,
readelf -h $1 | grep -q Entry.*[13579bdf]$
Ie, just look at the initial ELF entry value and see if the low bit is set. This is a fast check that fits the spirit of what I am looking for. unixsmurf has a good point that the code inside any ELF can mix and match ARM and Thumb. This maybe ok, if the program dynamically ids the CPU and selects an appropriate routine. Ie, just the presence of a Thumb instruction doesn't mean that code will execute.
Just looking at the entry value does determine which gcc compiler flags were used, at least for gcc versions 4.6 to 4.7.
Since thumb and arm sequences can be freely interchanged within an object file, even within the same section, plain ELF header inspection is not going to help you whether a file includes Thumb instructions or not.
A slightly roundabout and still not 100% foolproof way would be to use readelf -r and check if the output contains "R_ARM_THM", indicating a relocation for thumb.
I have a question about how compiler-set symbols, in particular CPU feature flags (like SSE, AES, AVX) are actually set. For instance, if I call gcc with -mavx, is the __AVX__ symbol set regardless of whether the system the code is about to be built on actually supports AVX instructions, or does it check before?
I'm asking because I need to build a particular code path depending on CPU capabilities and would like to automate it so that the correct path is determined upon compilation based on the build system, instead of manually enabling desired features. But since the only CPU I have supports basically every feature, I cannot test my above assumption (first world problems, I know)
There is going to be a lot of code so simply keeping everything and branching at runtime is unacceptable - and it is assumed that my library will be built before being used on a given system anyway.
I mean, at worst I can force this behavior by wrapping the gcc arguments in a cpuid-aware script, but if gcc does it automatically it would be preferable. So does anyone know whether it does?
I am mostly interested in gcc's take on this but I am also curious to know how other C compilers behave.
If you pass the -mavx flag, __AVX__ will always be set for the resulting compilation (and the resulting code may not run on non-AVX machines).
If you pass the -march=native flag, gcc will enable the instruction sets supported by the build machine, so __AVX__ will only be set if the build machine supports it.
I am compiling on a 64 bit architecture with the intel C compiler. The same code built fine on a different 64 bit intel architecture.
Now when I try to build the binaries, I get a message "Skipping incompatible ../../libtime.a" or some such thing, that is indicating the libtime.a that I archived (from some object files I compiled) is not compatible. I googled and it seemed like this was usually the result of a 32->64 bit changeover or something like that, but the intel C compiler doesnt seem to support a -64 or some other memory option at compile time. How do I troubleshoot and fix this error?
You cannot mix 64-bit and 32-bit compiled code. Config instructions for Linux are here.
You need to determine the target processor of both the library and the new code you are building. This can be done in a few ways but the easiest is:
$ objdump -f ../../libtime.a otherfile.o
For libtime this will probably print out bunches of things, but they should all have the same target processor. Make sure that otherfile.o (which you should substitute one of your object files for) also has the same architecture.
gcc has the -m32 and -m64 flags for switching from the default target to a similar processor with the different register and memory width (commonly x86 and x86_64), which the Intel C compiler may also have.
If this has not been helpful then you should include the commands (with all flags) used to compile everything and also information about the systems that each command was being run on.