How to enable NEON acceleration with dlib on arm? - arm

I notice that the latest version of dlib supports neon acceleration on arm. I've tried it on iphone6, with performance enhancement from 35ms to 28ms per frame (360X360). That is abnormal, since I've achieved a ten times speedup on the laptop with SSE2 acceleration (640X480). Does anyone know the reason of it?

Give GCC the -mfpu=neon switch.

Related

OpenCL, my GPU it's not capable?

I have an old computer, then I don't know if I can execute OpenCL codes on my PC; I've checked my GPU and I get this output:
When I execute OpenCL code, I get this error:
Finally, if I run clinfo, i get this:
I really don't know..It's a problem of libraries?Or my GPU cannot execute OpenCL codes?
Your GPU predates OpenCL. Beignet supports Ivybridge and later (https://www.freedesktop.org/wiki/Software/Beignet/#supportedtargets).
Your CPU also predates OpenCL. Intel's first release of their CPU-only OpenCL driver requires SSE4.1, but your CPU only has SSE3. If you really really need to get OpenCL to work on this machine, you may be able to install an old version (2.8) of the AMD OpenCL CPU driver if you can find it. Quote from http://boinc.berkeley.edu/wiki/OpenclCpu:
Intel's OpenCL support requires the SSE4.1 CPU feature (BOINC's event log shows you the list of your CPU's features).
If your host does not have SSE4.1 support, then you can install the AMD APP SDK 2.8 and it will install the AMD OpenCL CPU driver. Note that the AMD APP SDK v2.9 will NOT install it. You have to use 2.8 or earlier as they now bundle the OpenCL driver with the video driver instead of with the APP SDK. As AMD only keeps the last several versions on their archive page, you may want to grab both the 32 and 64 bit version of the v2.8 APP SDK now and keep them in a safe place.
Or maybe POCL or FreeOCL might cover you for the CPU.

OpenACC-OpenMP support Arm Mali GPU

I would like to ask if OpenACC or OpenMP support ARM Mali GPUs. I use OpenMP 4.0 which supports GPU parallelisation but I am not sure if it runs on the GPU. Do you have any idea how can I test it?
Neither are supported on Mali. Compute acceleration support is via OpenCL, or compute shaders in OpenGL ES / Vulkan.
Either/both specifications would work fine on Mali GPUs, but I'm not aware of any compilers that support offloading to Mali. GCC or CLANG would be your best bet, but I don't think either has a Mali target compiler.
The newly updated Arm C/C++ Compiler 21.1 with OpenMP 5.0 for Linux may support offloading to ARM MALI GPU targets.
OpenMP 5.0 features are supported by Arm C/C++ Compiler

ARMv8 backward compatibility with ARMv7 (Snapdragon 820 vs Cortex-A15)

I see that ARMv8 is merely an extension of ARMv7 architecture and all code compiled on ARMv7 should run on ARMv8. I am interested in the backward compatibility of ARMv8 to ARMv7. Will code that was compiled on ARMv8 run on ARMv7?
I have a particular exact case of interest: I would like to run the comma.ai's Openpilot visiond binary which was compiled for the OnePlus 3 smartphone (Qualcomm MSM8996 Snapdragon 820 CPU) on the Nvidia Jetson TK1 (NVIDIA Cortex-A15 CPU). Will the visiond run on Jetson?
EDIT: There may be more in question than CPU compatibility since visiond probably heavily uses GPU on that phone. Will probably depend whether they use some standard parallelization ways (OpenCL, NEON etc.) or have some custom code for Snapdragons GPU. Even with OpenCL the chance of compatibility is probably quite low on different HW.
I believe that aarch32 userland is fully or very highly backwards compatible with ARMv7, i.e. userland programs compiled for ARMv7 should just work in AArch32, but I couldn't find a precise quote in the ARM manual.
aarch32 does have new instructions added over ARMv7 however, most of them seem to be functionality that ARMv8 added and the designers decided to expose on aarch32. Therefore, aarch32 is not forward compatible with ARMv7, i.e., programs compiled for aarch32 might not run on ARMv7.
I'm not sure about system land. See also: Does ARMv8 AArch32 mode has backward compatible with armv4 , armv5 or armv6?

Compiling half float neon instructions for iOS

The issue I am having is with some neon instructions which I believe are supported on the arm7 architecture. I am using the default compiler (Apple LLVM 5.0), it recognises other neon instructions although it does not like the half-float instruction.
Here is the code:
vcvt.f32.f16, q0, d1
This has compiled on gcc although the apple compiler does not like this instruction and gives the error: Instruction requires: half-float
Is there a compiler flag I can give to XCode? I can't find out how to enable the half float instructions googling around.
Thanks!
The half-float format is actually not supported on all ARM v7 implementations. See the ARM manual here. It's required by vfp4, so if your chip supports that, that's a good start. In general I would recommend using run-time detection and dispatching. To enable the instruction in general, you would need to use one of several floating point support options, in general "fp16" is the keyword, for example:
-mfpu=neon-fp16 if you are sure that your target supports it for neon. I couldn't find all of the examples for llvm either, but I think they are generally compatible with the GCC options, found in the GCC manual.

Software simulation from ARM Cortex-M0

Is there a software simulator for ARM Cortex-M0 ?
I have a thumb only (not thumb2) instruction set simulator, goto github and search for thumbulator. Depends on what you are trying to do, could compile for thumb for a while then switch to thumb2 later.
For arm I found a behavioral verilog model out on a university site.
For thumb2 you might check and see if qemu supports it, I know there is support for the stellaris cortex-m3 so that may put you close enough.
There is no FOSS simulator. ARM documentation license prohibit documentation use for making simulator. You have to pay money to ARM to use documentation for simulation purposes and so all ARM simulators for latest architectures are non free.
You can download & use the free version of Keil uVision (limited to 32k)
IAR Embedded Workbench (www.iar.se) includes a simulator for Cortex cores. It is free (kickstarter version) up to 32kb of code size.

Resources