I am using Intel Core2Duo E4500 processor. It is supposed to have SSE3, SSSE3 facilities. But if I try to use them in programs it shows the following error "SSE3 instruction set not enabled"
Any ideas?
On Linux, have a look at the flags field of the output of cat /proc/cpuinfo
Try adding this gcc command line options:
-march=core2 -msse3
And probably is also a good idea to turn on sse optimizations for floating point operations:
-mfpmath=sse
Use CPU-Z to check for available instruction sets.
If you're using Visual Studio, there's an option in C/C++ -> Code Generation -> Enable Enhanced Instruction Set.
Here's how to enable it in gcc.
From the above link:
-msse3
-mssse3
If you compile on the same machine where you will be executing your code, with any recent gcc you should be able to use -march=native to take advantage of all your CPU features. It should tell you during compilation then, if you are using unsupported instructions in your asm.
Related
I am writing real mode function, which should be normal function with stackframes and so, but it should use %sp instead of %esp. Is there some way to do it?
GCC 5.2.0 (and possible earlier versions) support 16-bit code generation with the -m16 flag. However, the code will almost certainly rely on 32-bit processor features (such as 32-bit wide registers), so you should check the generated assembly carefully.
From the man pages:
The -m16 option is the same as -m32, except for that it outputs the
".code16gcc" assembly directive at the beginning of the assembly output
so that the binary can run in 16-bit mode.
Firstly, gcc could build 16bit code, because the linux kernel is go through realmode to protectmode, so it could even build 16bit c code.
Then, -m16 option is supported by GCC >= 4.9 and clang >= 3.5
gcc will ignore asm(".code16"),you can see it by -S output the assembly code surround by #APP #NO_APP
the linux kernel do the trick to compile 16bit c with a code16gcc.h(only have .code16gcc) pass to gcc compile params directly.
see Build 16-bit code with -m16 where possible, also see the linux kernel build Makefile
if you direct put the asm(".code16gcc"), see Writing 16-bit Code, it's not real 16bit code, call, ret, enter, leave, push, pop, pusha, popa, pushf, and popf instructions default to 32-bit size
GCC does not produce 8086 code. The GNU AS directive .code16gcc can be used to assemble the output of GCC to run in a 16-bit mode, put asm(".code16gcc") at the start of your C source, your program will be limited to 64Kibytes.
On modern GCC versions you can pass the -m16 argument to gcc which will produce code to run in a 16-bit mode. It still requires a 386 or later.
As far as I know, GCC does not support generation of code for 16-bit x86. For legacy bootloaders and similar purposes, you should write a small stub in assembly language to put the cpu in 32-bit mode and pass off execution to 32-bit code. For other purposes you really shouldn't be writing 16-bit code.
I will ask my question by giving an example. Now I have a function called do_something().
It has three versions: do_something(), do_something_sse3(), and do_something_sse4(). When my program runs, it will detect the CPU feature (see if it supports SSE3 or SSE4) and call one of the three versions accordingly.
The problem is: When I build my program with GCC, I have to set -msse4 for do_something_sse4() to compile (e.g. for the header file <smmintrin.h> to be included).
However, if I set -msse4, then gcc is allowed to use SSE4 instructions, and some intrinsics in do_something_sse3() is also translated to some SSE4 instructions. So if my program runs on CPU that has only SSE3 (but no SSE4) support, it causes "illegal instruction" when calls do_something_sse3().
Maybe I have some bad practice. Could you give some suggestions? Thanks.
I think that the Mystical's tip is fine, but if you really want to do it in the one file, you can use proper pragmas, for instance:
#pragma GCC target("sse4.1")
GCC 4.4 is needed, AFAIR.
I think you want to build what's called a "CPU dispatcher". I got one working (as far as I know) for GCC but have not got it to work with Visual Studio.
cpu dispatcher for visual studio for AVX and SSE
I would check out Agner Fog's vectorclass and the file dispatch_example.cpp
http://www.agner.org/optimize/#vectorclass
g++ -O3 -msse2 -c dispatch_example.cpp -od2.o
g++ -O3 -msse4.1 -c dispatch_example.cpp -od5.o
g++ -O3 -mavx -c dispatch_example.cpp -od8.o
g++ -O3 -msse2 instrset_detect.cpp d2.o d5.o d8.o
Here is an example of compiling a separate object file for each optimization setting:
http://notabs.org/lfsr/software/index.htm
But even this method fails when gcc link time optimization (-flto) is used. So how can a single executable be built with full optimization for different processors? The only solution I can find is to use include directives to make the C files behave as a single compilation unit so that -flto is not needed. Here is an example using that method:
http://notabs.org/blcutil/index.htm
If you are using GCC 4.9 or above on an i686 or x86_64 machine, then you are supposed to be able to use intrinsics regardless of your -march=XXX and -mXXX options. You could write your do_something() accordingly:
void do_something()
{
byte temp[18];
if (HasSSE2())
{
const __m128i i = _mm_loadu_si128((const __m128i*)(ptr));
...
}
else if (HasSSSE3())
{
const __m128i MASK = _mm_set_epi8(12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3);
_mm_storeu_si128(reinterpret_cast<__m128i*>(temp),
_mm_shuffle_epi8(_mm_loadu_si128((const __m128i*)(ptr)), MASK));
}
else
{
// Do the byte swap/endian reversal manually
...
}
}
You have to supply HasSSE2(), HasSSSE3() and friends. Also see Intrinsics for CPUID like informations?.
Also see GCC Issue 57202 - Please make the intrinsics headers like immintrin.h be usable without compiler flags. But I don't believe the feature works. I regularly encounter compile failures because GCC does not make intrinsics available.
I have written some code in C and need to convert this to MIPS 64, with and without optimisation. I have been trying to convert this with gcc but this converts it to x86 which is far more complex. Furthermore, I have been trying to find a cross compiler but have not been able to get any compiler to work. Any help and suggestions will tremendously be appreciated.
Kind regards,
After downloading and installing Codesourcery codebench for MIPS, invoke the MIPS gcc cross compiler for the MIPS 64 revision 2 architecture as follows:
C:\Program Files (x86)\CodeSourcery\Sourcery_CodeBench_Lite_for_MIPS_GNU_Linux\bin\mips-linux-gnu-gcc -march=mips64r2 foo.c -S
This generates MIPS assembly source code in foo.s.
The documentation that was installed with codebench will tell you the other possible values for the -march option. Other gcc flags like -S and -O work as normal.
If you use a MIPS cross compiler, instead of a gcc that targets x86, you can complete this "conversion" (compilation). If you need to find a MIPS cross compiler (gcc), you can get one pre-built from codesourcery.com
The wording in your question seems to suggest you don't care as much that the output is MIPS, but rather you want the output to be less complex than x86. If this is the case, you might also examine the ARM output.
I am looking for a real (source and generated code) example of software pipelining (http://en.wikipedia.org/wiki/Software_pipelining) produced by GCC. I tried to use -fmodulo-sched option when compiling for IA64 and PowerPC architectures by GCC versions 4.4-4.6 with no success.
Are you aware about such example? The actual CPU architecture has no difference.
Thank you
There are some tests from gcc testsuite for "-fmodulo-sched" option. You can check them:
http://www.google.com/codesearch/p?hl=en#OV-zwmL9vlY/gcc/gcc/testsuite/gcc.dg/sms-1.c&q=sms-6.c&d=4
files sms-1.c --- sms-7.c
Also here, http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/gcc.dg/ but gnu's viewcvs is very slow. The sms-8.c is added.
I'm using Cortex-A8 processor and I'm not understanding how to use the -mfpu flag.
On the Cortex-A8 there are both vfpv3 and neon co-processors. Previously I was not knowing how to use neon so I was only using
gcc -marm -mfloat-abi=softfp -mfpu=vfpv3
Now I have understood how SIMD processors run and I have written certain code using NEON intrinsics. To use neon co-processor now my -mfpu flag has to change to -mfpu=neon, so my compiler command line looks like this
gcc -marm -mfloat-abi=softfp -mfpu=neon
Now, does this mean that my vfpv3 is not used any more? I have lots of code which is not making use of NEON, do those parts not make use of vfpv3.
If both neon and vfpv3 are still used then I have no issues, but if only one of them is used how can I make use of both?
NEON implies having the traditional VFP support too. VFP can be used for "normal" (non-vector) floating-point calculations. Also, NEON does not support double-precision FP so only VFP instructions can be used for that.
What you can do is add -S to gcc's command line and check the assembly. Instructions starting with V (e.g. vld1.32, vmla.f32) are NEON instructions, and those starting with F (fldd, fmacd) are VFP. (Although ARM docs now prefer using the V prefix even for VFP instructions, GCC does not do that.)