The issue I am having is with some neon instructions which I believe are supported on the arm7 architecture. I am using the default compiler (Apple LLVM 5.0), it recognises other neon instructions although it does not like the half-float instruction.
Here is the code:
vcvt.f32.f16, q0, d1
This has compiled on gcc although the apple compiler does not like this instruction and gives the error: Instruction requires: half-float
Is there a compiler flag I can give to XCode? I can't find out how to enable the half float instructions googling around.
Thanks!
The half-float format is actually not supported on all ARM v7 implementations. See the ARM manual here. It's required by vfp4, so if your chip supports that, that's a good start. In general I would recommend using run-time detection and dispatching. To enable the instruction in general, you would need to use one of several floating point support options, in general "fp16" is the keyword, for example:
-mfpu=neon-fp16 if you are sure that your target supports it for neon. I couldn't find all of the examples for llvm either, but I think they are generally compatible with the GCC options, found in the GCC manual.
Related
I am trying to use _BitScanForward64 intrinsic (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=381,421,373&cats=Bit%20Manipulation) using MinGW 64 on Windows (GCC 4.8.1).
I tried:
#include "immintrin.h"
#include "x86intrin.h"
#include "xmmintrin.h"
but it still doesn't see the function (according to Intel guide it should be in "immintrin.h").
How do I get it to work using MinGW64 on Windows?
GCC does not use a intrinsic for this. It uses a built in function _builtin_ffs. Wikipedia has a nice summary for each compiler.
The reason Intel lists this intrinsic I think is that the Intel C++ compilers tries to support the same intrinsics which Microsoft created at least when used in Visual Studio on Windows. This unfortunately means that some intrinsics are not defined for each compiler. Even worse is that sometimes they are defined differently. For example Intel's definition of addcarry-u64 disagrees with Microsoft's definition but looking at the assembly shows that Intel uses Microsoft's definition and GCC and Clang use Intel's defintion.
For most x86 SIMD intrinsics GCC, Intel, Clang, and MSVC agree (with a few exceptions coming from MSVC) but for other intrinsics you have to get used to them being only defined in some compilers or being defined differently or having to use builtin functions or a library function.
The Intel compiler even has it's own intrinsics for this: _bit_scan_forward.
There are lots of examples of using arm neon intrinsics for android, with the ndk even having an example. I've gotten that to work with no problem.
Arm also offer the ACLE (Arm C Language Extension), but I can find next to nothing by way of examples. The arm document itself merely suggests including the arm_acle.h header file, however I still get errors. Google has offered almost zero assistance :) Also searching the arm community boards has yielded little by way of results.
Do people not use the acle, and chose inline assembly instead?
When I inlcude the arm_acle.h and atttempt to use the __ssat() call, I have to further define a directive __ARM_FEATURE_CRC32, and when building I get the error" error: '__builtin_arm_qadd' was not declared in this scope"
The header doesn't seen to include any dependencies, and the documentation list no specific link dependencies.
Any advice?
Or am I overlooking something fundamental here?
Additional Information:
My target arch is armv7-a-neon and is correctly detected in the make file at build time.
I then further define "-mfloat-abi=softfp -mfpu=neon -march=armv7", but to no avail.
If I undo my additional debugging defines, I simply get " error: #error "ACLE intrinsics support not enabled." (Neon support and detection succeeds)
Searching my code base, the arm_acle.h header file is only present for the clang host tools, whereas arm_neon.h is is present for several prebuilts tool arm directories.
As I said, the arm_neon works detection works fine, and runs fine, it's the arm_acle component that doesn't work.
Searching the online repositories like http://androidxref.com seems to suggest only neon is supported?
The ARM C Language Extensions are currently not fully supported in GCC (as of version 5.1). The Android NDK normally uses a version of GCC older than this, which also does not have full support for ACLE.
This page https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html gives some idea of the current level of implementation of ACLE for both ARM and AArch64 targets. As you'll see there, the only features of ACLE currently provided by GCC are the CRC32 intrinsics in arm_acle.h and the Neon Intrinsics you've already found in arm_neon.h.
I am getting the following error while trying to compile some code for an ARM Cortex-M4
using
gcc -mcpu=cortex-m4 arm.c
`-mcpu=' is deprecated. Use `-mtune=' or '-march=' instead.
arm.c:1: error: bad value (cortex-m4) for -mtune= switch
I was following GCC 4.7.1 ARM options. Not sure whether I am missing some critical option. Any kickstart for using GCC for ARM will also be really helpful.
As starblue implied in a comment, that error is because you're using a native compiler built for compiling for x86 CPUs, rather than a cross-compiler for compiling to ARM.
GCC only supports a single general architecture type in any given compiler binary -- so, although the same copy of GCC can compile for both 32-bit and 64-bit x86 machines, you can't compile to both x86 and ARM with the same copy of GCC -- you need an ARM-specific GCC.
(As auselen suggests, getting a pre-built one will save you quite a lot of work, even if you're only using it as a starting point to get things set up. You need to have GCC, binutils, and a C library as a minimum, and those are all separate open-source projects that the pre-built versions have already done the work of combining. I'll recommend Sourcery CodeBench Lite since that's the one my company makes and I do think it's a fairly good one.)
As the error message says -mcpu is deprecated, and you should use the other options stated. However "deprectated" simply means that its use may not continue to be supported; it will still work.
ARM Cortex-M4 is ARM Architecture V7E-M, so you should use -march=armv7-m (the documentation does not specifically list armv7e-m, but that may have been added since the documentation was last updated. The E is essentially the difference between M3 and M4 - the DSP instructions, so the compiler will not generate code that takes advantage of these instructions. Using ARM's Cortex-M DSP library is probably the best way to use these instructions to benefit your application. If your part has an FPU, then other options will be needed enable code generation for that.
Like others already pointed out, you are using a compiler for your host machine, and you need a compiler for generating code for your target processor instead (a cross compiler). Like #Brooks suggested, you can use a pre-built toolchain, but if you want to roll out your own cross-compiler, libc and binutils, there is a nice tool called Crosstool-NG. It greatly simplifies the process of building a cross-compiler optimized to generate code for a specific processor, so you're not stuck with a generic prebuilt toolchain, which usually builds code for a family of compatible processors (e.g. you could tune the toolchain for generating ASM for your specific target, or floating point code for a hardware FPU which is specific to your processor, instead of using only software floating point routines, which are default to most pre-built toolchains).
What's the differences between arm-eabi-gcc and arm-elf-gcc?
Can they both compile the same source code for cortex-m3 arch?
arm-elf-gcc is the old toolchain supporting legacy floating-point accelerator (FPA) and the mixed-endian floating-point format.
arm-eabi-gcc is the newer geneartion of toolchain supporting VFP floating-point format.
I imagine they can compile the same source, but the later one is newer so that must be richer feature wise. What you want to hear depends on which OS / libraries you are compiling against. Toolchain, fundamental libraries and OS go arm in arm. They need to have same ABIs.
I am working on Nehalam/westmere Intel micro architecture CPU. I want to optimize my code for this Architecture. Are there any specialized compilation flags or C functions by GCC which will help me improve my code's run time performance?
I am already using -O3.
Language of the Code - C
Platform - Linux
GCC Version - 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
In my code I have some floating point comparison and they are done over a million time.
Please assume the code is already best optimized.
First, if you really want to profit from optimization on newer processors like this one, you should install the newest version of the compiler. 4.4 came out some years ago, and even if it still seems maintainted, I doubt that the newer optimization code is backported to that. (Current version is 4.7)
Gcc has a catch-all optimization flag that usually should produce code that is optimized for the compilation architecture: -march=native. Together with -O3 this should be all that you need.
Warning: the answer is incorrect.
You can actually analyze all disabled and enabled optimizations yourself. Run on your computer:
gcc -O3 -Q --help=optimizers | grep disabled
And then read about the flags that are still disabled and can according to the gcc documentation influence performance.
You'll want to add an -march=... option. The ... should be replaced with whatever is closest to your CPU architecture (there tend to be minor differences) described in the i386/x86_64 options for GCC here.
I would use core2 because corei7 (the one you'd want) is only available in GCC 4.6 and later. See the arch list for GCC 4.6 here.
If you really want to use a gcc so old that it doesn't support corei7, you could use -mtune=barcelona