I am trying to port some code that used to run on window to opensuse 12.1. But I am having problem with compiling a section of the code that use SSE instruction.
The opensuse is running on an intel Core i7 with these flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid.
Most of the SSE instruction are fine, but the compiler can't seem to know: _mm_dp_ps.
It is also complaining about __builtin_ia32_pshufd and _mm_cvtepu8_epi32.
Can anyone please help me? What am I missing?
_mm_dp_ps and _mm_cvtepu8_epi32 are both SSE4.1 - so you need:
#include <smmintrin.h> // SSE 4.1 intrinsics
and you also need to compile with:
$ gcc -msse4.1 ...
Related
I'm not sure if SO is the best place to ask this question but...
I've compiled my C program on, Linux Ubuntu (16.04), using
gcc -o MM1.x86 -O3 -static -mavx2 -g -Wall MM1.c
Running the file command
file MM1.x86
MM1.x86: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=078be440b466ba97bd4bf468b24eee35f8c6a01f, not stripped
But I've also cross compiled the same source file for ARM using
arm-linux-gnueabi-gcc -o MM1.ARM -O3 -static -g -Wall MM1.c
Running file
file MM1.ARM
MM1.ARM: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=b6582e1073682d41eb5262ad5393cebb8578e05d, not stripped
But after running the ARM compiled program, I was wondering how does it even run on x86_64 if it was compiled for ARM architecture?
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
Stepping: 3
CPU MHz: 886.640
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 7183.90
Virtualisation: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
I did notice that the ARM version runs slightly slower, why is this so?
I would like to use the Intel C Compiler due to its superior vectorization abilities. However, it understandably has no -march=bdver2 flag which is what I would use on gcc for my AMD FX-8350 CPU. It does have -xavx but I am not sure what other flags to use.
What are the optimal compiler flags for the AMD FX-8350 CPU using the Intel C Compiler?
cat /proc/cpuinfo |grep flags gives:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid
aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt
aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce
nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall
bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold
i don't exactly understand the manual/help from libcURL. I'm trying to build a Cross-Compiler including the libcURL library. I installed a Cross-Compiler on my Server with the help of this video and can start it with arm-linux-gnueabihf-gcc hello_simple.c. I could compile simple c-code like printf ("Hello World"); After that i tried to install libcURL and read that I need to compile the lib and use the configure file to set the build and host. I used different configuration like: sudo ./configure --build=i586-pc-linux-gnu --host=arm-linux --target=arm-linux --prefix=/home/nevadmin/dev/gcc but neither did one worked. I think I'm making somewhere mistakes. This is the output after configuration is made:
curl version: 7.46.0
Host setup: arm-unknown-linux-gnu
Install prefix: /home/nevadmin/dev/gcc
Compiler: gcc
SSL support: no (--with-{ssl,gnutls,nss,polarssl,mbedtls,cyassl,axtls,winssl,darwinssl} )
SSH support: no (--with-libssh2)
zlib support: no (--with-zlib)
GSS-API support: no (--with-gssapi)
TLS-SRP support: no (--enable-tls-srp)
resolver: default (--enable-ares / --enable-threaded-resolver)
IPv6 support: no (--enable-ipv6)
Unix sockets support: enabled
IDN support: no (--with-{libidn,winidn})
Build libcurl: Shared=yes, Static=yes
Built-in manual: enabled
--libcurl option: enabled (--disable-libcurl-option)
Verbose errors: enabled (--disable-verbose)
SSPI support: no (--enable-sspi)
ca cert bundle: no
ca cert path: no
LDAP support: no (--enable-ldap / --with-ldap-lib / --with-lber-lib)
LDAPS support: no (--enable-ldaps)
RTSP support: enabled
RTMP support: no (--with-librtmp)
metalink support: no (--with-libmetalink)
PSL support: no (libpsl not found)
HTTP2 support: disabled (--with-nghttp2)
Protocols: DICT FILE FTP GOPHER HTTP IMAP POP3 RTSP SMTP TELNET TFTP
My Server cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5649 # 2.53GHz
stepping : 2
microcode : 0x15
cpu MHz : 2533.423
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat dtherm
bogomips : 5066.84
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
And my controller:
Processor : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 298.80
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Using for example gcc hello_world.c -o hello_world it compile it for the amd processor architecture and not arm. It seems I'm missing the linking to the library for libcURL? I appreciate every help. And sorry my english, it's not my native language.
Linking with L/home/nevadmin/dev/gcc -lcurl is working an I can compile a c-code with libcURL but it still compiling it for amd64 and not arm. :/
You need to set the proper compiler to use when you configure curl. If you look at curl's configure output from
./configure --help
You'll see this at the end:
...
Some influential environment variables:
CC C compiler command
CFLAGS C compiler flags
LDFLAGS linker flags, e.g. -L<lib dir> if you have libraries in a
nonstandard directory <lib dir>
LIBS libraries to pass to the linker, e.g. -l<library>
CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if
you have headers in a nonstandard directory <include dir>
CPP C preprocessor
Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations.
You can set these variables with a command line like:
CC=arm-linux-gnueabihf-gcc ./configure --build=i586-pc-linux-gnu --host=arm-linux --target=arm-linux --prefix=/home/nevadmin/dev/gcc
(You don't need the "sudo" to configure.)
gcc by itself will give you an AMD executable. You need to use arm-linux-gnueabihf-gcc to get an ARM executable.
Notice that in your curl configuration output it says
Compiler: gcc
That's why your getting a curl library built for your AMD. Bibliothek is called "library" in English. ;-)
If you continue to have problems cross compiling for ARM you could take a look at the binary releases of the cross compilation tool chain ELLCC. As of version 0.1.21 it comes with several pre-compiled libraries, including curl. Here's the ChangeLog.
My CPU has following CPU features
cat /proc/cpuinfo
Processor : ARMv7 Processor rev 4 (v7l)
processor : 0
BogoMIPS : 1192.96
processor : 1
BogoMIPS : 1197.05
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 4
Hardware : sun7i
Revision : 0000
And gcc sets
-march=armv7ve -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5
options for
cat main.c
#include <stdio.h>
void main()
{
printf("Hello World!\n");
}
compiled with
gcc -march=native -mtune=native -Q -v main.c
Isn't neon-vfpv4, which seems to be supported by CPU features, is superior to vfpv3-d16, which gcc sets?
I got only vague explanation of what vfpv3-d16 is from ARM's documentation and nothing on neon-vfpv4.
I'm using gcc 4.9.1
-march and -mtune (or -mcpu as a shorthand for both) only control the CPU options for instruction selection and scheduling. As an example, with a GCC 4.8-based cross-toolchain, when I do this:
arm-linux-gnueabihf-gcc -mcpu=arm250 -v -c test.c
I get this:
...
COLLECT_GCC_OPTIONS='-mcpu=arm250' '-v' '-c' '-mfloat-abi=hard'
'-mfpu=vfpv3-d16' '-mthumb' '-mtls-dialect=gnu'
...
which is clearly nonsense - the ARM250 predates VFP (and even Thumb) by a long way - because for any unspecified options it's just passing through whatever was configured as the default:
...
Configured with:
... --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 ...
... --with-mode=thumb --with-float=hard
Your Cortex-A7 indeed supports full VFPv4 and NEON, so passing -mfpu=neon-vfpv4 to override the default would be the right thing to do. Unfortunately there doesn't seem to be an equivalent -mfpu=native option (at least documented - I don't have a native toolchain handy to check).
I am using OPROFILE to collect some performance data.
but I got in troubule.
Here is my shell:
~ # rm -f /root/.oprofile/daemonrc
~ # opcontrol --setup --no-vmlinux
~ # opcontrol --init
~ # opcontrol --reset
~ # opcontrol --start
~ # opcontrol --status
Daemon running: pid 14909
Separate options: none
vmlinux file: none
Image filter: none
Call-graph depth: 0
~ # opcontrol --shutdown
Stopping profiling.
Killing daemon.
~ # opreport
error: no sample files found: profile specification too strict?
~ # tree /var/lib/oprofile/
/var/lib/oprofile/
├── abi
├── complete_dump
├── jitdump
├── opd_pipe
└── samples
├── current
│ └── stats
│ ├── bt_lost_no_mapping
│ ├── cpu0
│ │ ├── backtrace_aborted
│ │ ├── sample_invalid_eip
│ │ ├── sample_lost_overflow
│ │ └── sample_received
│ ├── event_lost_overflow
│ ├── multiplex_counter
│ ├── sample_lost_no_mapping
│ └── sample_lost_no_mm
└── oprofiled.log
5 directories, 13 files
~ # dmesg |grep oprofile
oprofile: using NMI interrupt.
~ # uname -a
Linux localhost.localdomain 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue Feb 14 04:00:16 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux
~ # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 # 2.40GHz
stepping : 2
cpu MHz : 2400.085
cache size : 12288 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dts
bogomips : 4800.17
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
Some CPU types do not provide the needed hardware support to use the hardware performance counters.
On these machines, OProfile falls back to using the timer interrupt for profiling, back to using the real-time clock interrupt to collect samples.
you can force use of the timer interrupt by using the timer=1 module parameter.If OProfile was built as a kernel module, then you must pass the 'timer=1' parameter with the modprobe command. Do this before executing 'opcontrol --init' or edit the opcontrol command's invocation of modprobe to pass the 'timer=1' parameter
modprobe oprofile timer=1
Then continue your profiling procedure
I ran into a similar problem on a RHEL6 based distribution. At some point, I started using perf with which I was able to get profiler reports and annotated source code.