AMD xop check support - c

I have next problem:
I have some tests related to xop check with using some Bulldozer (xop) instructions.
And I must run this tests only on Bulldozer processors.
How can I check that my processor supports xop instruction at compile-time?
Language: C, Os: Linux;

You can write a program that checks CPUID and use the output of that program while compiling:
gcc $(cpuid_test) my_prog.c
where cpuid_test returns '-march=bdver1' or -DXOP_SUPPORT=1

You cannot test at compile time, but you can compile for AMD Bulldozer using:
$ gcc -march=bdver1 -mtune=bdver1 ...
See: http://gcc.gnu.org/gcc-4.6/changes.html
If your build machine is your target machine, look into /proc/cpuinfo.

If a source is compiled with -march=bdver1 (which enables XOP support among other things), the preprocessor macro __XOP__ will be defined to 1.
You can test at compile time for XOP with
#ifdef __XOP__
...XOP code path here...
#else
...non XOP code here...
#endif

Related

Converting C to nasm assembly in 16 bit [duplicate]

I am writing real mode function, which should be normal function with stackframes and so, but it should use %sp instead of %esp. Is there some way to do it?
GCC 5.2.0 (and possible earlier versions) support 16-bit code generation with the -m16 flag. However, the code will almost certainly rely on 32-bit processor features (such as 32-bit wide registers), so you should check the generated assembly carefully.
From the man pages:
The -m16 option is the same as -m32, except for that it outputs the
".code16gcc" assembly directive at the beginning of the assembly output
so that the binary can run in 16-bit mode.
Firstly, gcc could build 16bit code, because the linux kernel is go through realmode to protectmode, so it could even build 16bit c code.
Then, -m16 option is supported by GCC >= 4.9 and clang >= 3.5
gcc will ignore asm(".code16"),you can see it by -S output the assembly code surround by #APP #NO_APP
the linux kernel do the trick to compile 16bit c with a code16gcc.h(only have .code16gcc) pass to gcc compile params directly.
see Build 16-bit code with -m16 where possible, also see the linux kernel build Makefile
if you direct put the asm(".code16gcc"), see Writing 16-bit Code, it's not real 16bit code, call, ret, enter, leave, push, pop, pusha, popa, pushf, and popf instructions default to 32-bit size
GCC does not produce 8086 code. The GNU AS directive .code16gcc can be used to assemble the output of GCC to run in a 16-bit mode, put asm(".code16gcc") at the start of your C source, your program will be limited to 64Kibytes.
On modern GCC versions you can pass the -m16 argument to gcc which will produce code to run in a 16-bit mode. It still requires a 386 or later.
As far as I know, GCC does not support generation of code for 16-bit x86. For legacy bootloaders and similar purposes, you should write a small stub in assembly language to put the cpu in 32-bit mode and pass off execution to 32-bit code. For other purposes you really shouldn't be writing 16-bit code.

How to check that microprocessor is Altera Nios?

I writes some C-program code for Altera/Nios II microprocessor (uP). This code will be different with Altera Arm 9 microprocessor. So I need to write 2 different code pieces for different uP-s. How can I check in execution time which uP is present. Or more simple, current uP is Nios or not.
As the two processors are from different architectures, you will not be able to check which processor is running at run-time. You could do it at compile time, as you will have a specific define flag set by your toolchain (see https://sourceforge.net/p/predef/wiki/Architectures/). For Arm it should be __arm__ or similar, depending on the toolchain you are using for the HPS.
#ifdef __arm__
<specific code for HPS>
#else
<specific code for NIOS>
#endif /* __arm__ */
You can also look at the toolchain's defines using the c pre-processor command (cpp):
<toolchain>-cpp -dM /dev/null
Note: for Arm processor, the MIDR register could be used to know which type you are running and this one could be accessed at runtime. But when building for NIOS II, you would have a compilation error. So you need to use the preprocessor to call specific Arm register name and to remove the code when building for NiosII.
Presumably it will be compiled with a different compiler? These compilers will (very likely) have a #define of some sort which you can use to build different code for each one.
You can make the compiler dump all its default preprocessor defines using:
echo | ./nios2-elf-gcc.exe -dM -E -
This will in particular emit:
#define nios2 1

How to check with Intel intrinsics if AVX extensions is supported by the CPU?

I'm writing a program using Intel intrinsics. I want to use _mm_permute_pd intrinsic, which is only available on CPUs with AVX. For CPUs without AVX I can use _mm_shuffle_pd but according to the specs it is much slower than _mm_permute_pd. Do the header files for Intel intrinsics define constants that allow me to distinguish whether AVX is supported so that I can write sth like this:
#ifdef __IS_AVX_SUPPORTED__ // is there sth like this defined?
// use _mm_permute_pd
# else
// use _mm_shuffle_pd
#endif
? I have found this tutorial, which shows how to perform a runtime check but I need to do a static, compile-time check for the current machine.
GCC, ICC, MSVC, and Clang all define a macro __AVX__ which you can check. In fact it's the only SIMD constant defined by all those compilers (MSVC is the one that breaks the mold). This only tells you if your code was compiled with AVX support (e.g. -mavx with GCC or /arch:AVX with MSVC) it does not tell you if your CPU supports AVX. If you want to know if the CPU supports AVX you need to check CPUID. Here, asm-in-c-error, is an example to read CPUID from all those compilers.
To do this properly I suggest you make a CPU dispatcher.
Edit: In case anyone wants to know how to use the values from CPUID to find out if AVX is available see https://github.com/Mysticial/FeatureDetector
I assume you are using Intel C++ Compiler. In this case - yes, there are such macros: Intel C++ Compiler Reference Guide: __AVX__, __AVX2__.
P.S. Be aware that if you compile you application with AVX instruction set enabled it will fail on CPUs not supporting AVX. If you are going to distribute your software as source code package and compile on target machine - this is may be a viable solution. Otherwise you should check for AVX dynamically.
P.P.S. There are several options for ICC. Take a look at the following compiler options and also references from it to other.
It seems to me that the only way is to compile and run a program that identifies whether AVX is available. Then manually or automatically compile separate code with or without AVX functions. For VS 2013, I would used my code in commomAVX folder in the following to identify hasAVX (or not) and use this to execute one of two different BAT files to compile and link the appropriate program.
http://www.roylongbottom.org.uk/gigaflops-benchmarks.zip
My question was to help to identify a solution regarding the use of suitable compile options such as /arch:AVX.

What's the proper way to use different versions of SSE intrinsics in GCC?

I will ask my question by giving an example. Now I have a function called do_something().
It has three versions: do_something(), do_something_sse3(), and do_something_sse4(). When my program runs, it will detect the CPU feature (see if it supports SSE3 or SSE4) and call one of the three versions accordingly.
The problem is: When I build my program with GCC, I have to set -msse4 for do_something_sse4() to compile (e.g. for the header file <smmintrin.h> to be included).
However, if I set -msse4, then gcc is allowed to use SSE4 instructions, and some intrinsics in do_something_sse3() is also translated to some SSE4 instructions. So if my program runs on CPU that has only SSE3 (but no SSE4) support, it causes "illegal instruction" when calls do_something_sse3().
Maybe I have some bad practice. Could you give some suggestions? Thanks.
I think that the Mystical's tip is fine, but if you really want to do it in the one file, you can use proper pragmas, for instance:
#pragma GCC target("sse4.1")
GCC 4.4 is needed, AFAIR.
I think you want to build what's called a "CPU dispatcher". I got one working (as far as I know) for GCC but have not got it to work with Visual Studio.
cpu dispatcher for visual studio for AVX and SSE
I would check out Agner Fog's vectorclass and the file dispatch_example.cpp
http://www.agner.org/optimize/#vectorclass
g++ -O3 -msse2 -c dispatch_example.cpp -od2.o
g++ -O3 -msse4.1 -c dispatch_example.cpp -od5.o
g++ -O3 -mavx -c dispatch_example.cpp -od8.o
g++ -O3 -msse2 instrset_detect.cpp d2.o d5.o d8.o
Here is an example of compiling a separate object file for each optimization setting:
http://notabs.org/lfsr/software/index.htm
But even this method fails when gcc link time optimization (-flto) is used. So how can a single executable be built with full optimization for different processors? The only solution I can find is to use include directives to make the C files behave as a single compilation unit so that -flto is not needed. Here is an example using that method:
http://notabs.org/blcutil/index.htm
If you are using GCC 4.9 or above on an i686 or x86_64 machine, then you are supposed to be able to use intrinsics regardless of your -march=XXX and -mXXX options. You could write your do_something() accordingly:
void do_something()
{
byte temp[18];
if (HasSSE2())
{
const __m128i i = _mm_loadu_si128((const __m128i*)(ptr));
...
}
else if (HasSSSE3())
{
const __m128i MASK = _mm_set_epi8(12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3);
_mm_storeu_si128(reinterpret_cast<__m128i*>(temp),
_mm_shuffle_epi8(_mm_loadu_si128((const __m128i*)(ptr)), MASK));
}
else
{
// Do the byte swap/endian reversal manually
...
}
}
You have to supply HasSSE2(), HasSSSE3() and friends. Also see Intrinsics for CPUID like informations?.
Also see GCC Issue 57202 - Please make the intrinsics headers like immintrin.h be usable without compiler flags. But I don't believe the feature works. I regularly encounter compile failures because GCC does not make intrinsics available.

SIMD version check

I am using Intel Core2Duo E4500 processor. It is supposed to have SSE3, SSSE3 facilities. But if I try to use them in programs it shows the following error "SSE3 instruction set not enabled"
Any ideas?
On Linux, have a look at the flags field of the output of cat /proc/cpuinfo
Try adding this gcc command line options:
-march=core2 -msse3
And probably is also a good idea to turn on sse optimizations for floating point operations:
-mfpmath=sse
Use CPU-Z to check for available instruction sets.
If you're using Visual Studio, there's an option in C/C++ -> Code Generation -> Enable Enhanced Instruction Set.
Here's how to enable it in gcc.
From the above link:
-msse3
-mssse3
If you compile on the same machine where you will be executing your code, with any recent gcc you should be able to use -march=native to take advantage of all your CPU features. It should tell you during compilation then, if you are using unsupported instructions in your asm.

Resources