Is Neon coprocessor being used? - c

I am using the FFTW3 library on Beagleboard xM in a C application to perform r2c FFTs of floats. I read on this page that FFTW3 includes support for Neon, which is part of the xM architecture.
Is there a way to tell if the Neon coprocessor is actually being used?
For example, can I lists symbols from the object files and parse for some special Neon symbols? Alternatively, can I look through gcc -S assembler output for any Neon instructions? What instruction(s) would I look for? (I'm not familiar with what Neon assembly looks like).

Look at the disassembly. NEON instructions that operate on float data have a .f32 suffix and the NEON registers have names of the form dN or qN (where N is an integer). So if you see instructions that look like:
vadd.f32 q0, q1, q2
then NEON is being used.

Related

ARMv8A AArch64 vmlal_high_s8 Intrinsics

I'm looking for the intrinsic corresponding to the operation 'SMLAL2 Vd.8H,Vn.16B,Vm.16B', which according to ARM's own documentation (ARM Neon Intrinsics Ref) should be something like
int16x8_t vmlal_high_s8 (int16x8_t a,int8x16_t b,int8x16_t c)
however in the arm_neon.h that is included in ARM's GNU Toolchain doesn't have anything corresponding to it. So my question would be if I have to just include something else or otherwise can somehow circumvent this problem.
Thanks in advance!
For anyone else hitting this problem: I had chosen the ARM embedded tool chain instead of the linaro one, which is suitable for aarch64

memcpy optimization in cortex-a8 arm

i use memcpy() in my implementation on ARM Cortex a8,
it is my first code to develop on ARM Processors.
i read in the following link that i can optimize performance through some strategies.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html
my code says
..
memcpy(myvar1, myvar2, myvar_size * sizeof(var_complex));
..
how can i optimize this code for ARM Cortex-a8 using Eclipse GCC Toolchain.
my code contains both C and Assembly codes.
is that affects using some registers ?
i searched for some examples and didn't found.

ARM NEON SIMD version 2

What is the difference between NEON SIMD and NEON SIMD version 2 as in Cortex A15?
It adds SIMD FMA instruction (VFMA.F32) and also mandates NEON half precision extension. NEONv2 is supported in ARM Cortex-A7, ARM Cortex-A15, and Qualcomm Krait (not sure about ARM Cortex-A5).
It is not that much of a difference, from ARM ARM:
(in reverse order of definitions)
Advanced SIMDv2 is an OPTIONAL extension to the ARMv7-A and ARMv7-R profiles.
Advanced SIMDv2 adds both the Half-precision Extension and the fused
multiply-add instructions to the features of Advanced SIMDv1.
...
Advanced SIMDv1 can be extended by the OPTIONAL Half-precision Extension,
that provides conversion functions in both directions between half-precision
floating-point and single-precision floating-point.
...
The Advanced SIMD architecture extension, its associated implementations, and supporting software, are
commonly referred to as NEON™
technology.

Writing a piece of C code such that compiler uses SSE4.1 instruction for generating assembly Code

I want to write some C code such that gcc using the -msse4.1 flag can optimize it. Basically I want to check whether or not the compiler is taking advantage of SSE4.1 instructions.
There are many SSE4.1 instructions (http://en.wikipedia.org/wiki/SSE4#New_instructions) but I am not able to write a fragment of C Code which is using any of those instructions in the generated assembly code.
Thanks in advance.
From what I've seen, compilers rarely ever generate SSE4.1 instructions. I've seen a few cases where it will use the insert/extract instructions to pack data.
But for the most part, if you want to use the SSE4.1 instructions, you need to do them explicitly using intrinsics:
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_bk_sse41.htm
I doubt GCC would emit SSE4.1 instructions that easily. But you could have a look at Intel SPMD Program Compiler:
Under the SPMD model, the programmer writes a program that mostly
appears to be a regular serial program, though the execution model is
actually that a number of program instances execute in parallel on the
hardware. (See a more detailed example that illustrates this concept.)
ispc compiles a C-based SPMD programming language to run on the SIMD
units of CPUs; it frequently provides a 3x or more speedup on CPUs
with 4-wide SSE units, without any of the difficulty of writing
intrinsics code.

ARM Cortex-A8: How to make use of both NEON and vfpv3

I'm using Cortex-A8 processor and I'm not understanding how to use the -mfpu flag.
On the Cortex-A8 there are both vfpv3 and neon co-processors. Previously I was not knowing how to use neon so I was only using
gcc -marm -mfloat-abi=softfp -mfpu=vfpv3
Now I have understood how SIMD processors run and I have written certain code using NEON intrinsics. To use neon co-processor now my -mfpu flag has to change to -mfpu=neon, so my compiler command line looks like this
gcc -marm -mfloat-abi=softfp -mfpu=neon
Now, does this mean that my vfpv3 is not used any more? I have lots of code which is not making use of NEON, do those parts not make use of vfpv3.
If both neon and vfpv3 are still used then I have no issues, but if only one of them is used how can I make use of both?
NEON implies having the traditional VFP support too. VFP can be used for "normal" (non-vector) floating-point calculations. Also, NEON does not support double-precision FP so only VFP instructions can be used for that.
What you can do is add -S to gcc's command line and check the assembly. Instructions starting with V (e.g. vld1.32, vmla.f32) are NEON instructions, and those starting with F (fldd, fmacd) are VFP. (Although ARM docs now prefer using the V prefix even for VFP instructions, GCC does not do that.)

Resources