Is it possible to inline a subroutine from the intel mkl library? - inline

My code calls the Intel MKL dgemv many times. Is it possible to inline the MKL dgemv?

In general, compilers will not inline BLAS calls at the source level but the linker can do it under some circumstances. However, I've heard that Intel is providing some support for inlining MKL in C code in the version 15 release of the compiler. But you'll have refer to the official documentation for confirmation and details.

Related

_BitScanForward64 using MinGW on Windows

I am trying to use _BitScanForward64 intrinsic (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=381,421,373&cats=Bit%20Manipulation) using MinGW 64 on Windows (GCC 4.8.1).
I tried:
#include "immintrin.h"
#include "x86intrin.h"
#include "xmmintrin.h"
but it still doesn't see the function (according to Intel guide it should be in "immintrin.h").
How do I get it to work using MinGW64 on Windows?
GCC does not use a intrinsic for this. It uses a built in function _builtin_ffs. Wikipedia has a nice summary for each compiler.
The reason Intel lists this intrinsic I think is that the Intel C++ compilers tries to support the same intrinsics which Microsoft created at least when used in Visual Studio on Windows. This unfortunately means that some intrinsics are not defined for each compiler. Even worse is that sometimes they are defined differently. For example Intel's definition of addcarry-u64 disagrees with Microsoft's definition but looking at the assembly shows that Intel uses Microsoft's definition and GCC and Clang use Intel's defintion.
For most x86 SIMD intrinsics GCC, Intel, Clang, and MSVC agree (with a few exceptions coming from MSVC) but for other intrinsics you have to get used to them being only defined in some compilers or being defined differently or having to use builtin functions or a library function.
The Intel compiler even has it's own intrinsics for this: _bit_scan_forward.

Compiling with a parallel compiler for use with Ruby?

I have a (semi) theoretical question.
Is it possible to compile C code with a parallel specific compiler such as OpenMP, CUDA, etc., to write a C extension to the Ruby language that operates better (theoretically) for distributed machines?
If what you mean is, can you compile parallel C code as Ruby extensions, I think you mostly should be able to do so. For example, a numeric extension to Ruby that does matrix multiplications can call into Intel MKL's DGEMM routine and MKL is parallelised internally with OpenMP.
You should only be careful not to mix in the same program extensions that use different OpenMP runtimes. Take for example GCC, Intel C/C++ compiler, and Oracle's Solaris Studio. Each of them uses has its own implementation of the OpenMP runtime. These implementations are at large incompatible with each other and mixing them in the same executable (either by static linking or by dynamically loading modules at runtime) is a call to disaster.
Another thing to note is whether the Ruby runtime is thread-safe and whether it could be called from a multithreaded code. This remark is not relevant if one does not call the Ruby runtime from inside OpenMP parallel regions.
As there is a single CUDA vendor, one should only be concerned with version mismatches when it comes to CUDA-enabled extensions.

GCC Error while compiling for ARM

I am getting the following error while trying to compile some code for an ARM Cortex-M4
using
gcc -mcpu=cortex-m4 arm.c
`-mcpu=' is deprecated. Use `-mtune=' or '-march=' instead.
arm.c:1: error: bad value (cortex-m4) for -mtune= switch
I was following GCC 4.7.1 ARM options. Not sure whether I am missing some critical option. Any kickstart for using GCC for ARM will also be really helpful.
As starblue implied in a comment, that error is because you're using a native compiler built for compiling for x86 CPUs, rather than a cross-compiler for compiling to ARM.
GCC only supports a single general architecture type in any given compiler binary -- so, although the same copy of GCC can compile for both 32-bit and 64-bit x86 machines, you can't compile to both x86 and ARM with the same copy of GCC -- you need an ARM-specific GCC.
(As auselen suggests, getting a pre-built one will save you quite a lot of work, even if you're only using it as a starting point to get things set up. You need to have GCC, binutils, and a C library as a minimum, and those are all separate open-source projects that the pre-built versions have already done the work of combining. I'll recommend Sourcery CodeBench Lite since that's the one my company makes and I do think it's a fairly good one.)
As the error message says -mcpu is deprecated, and you should use the other options stated. However "deprectated" simply means that its use may not continue to be supported; it will still work.
ARM Cortex-M4 is ARM Architecture V7E-M, so you should use -march=armv7-m (the documentation does not specifically list armv7e-m, but that may have been added since the documentation was last updated. The E is essentially the difference between M3 and M4 - the DSP instructions, so the compiler will not generate code that takes advantage of these instructions. Using ARM's Cortex-M DSP library is probably the best way to use these instructions to benefit your application. If your part has an FPU, then other options will be needed enable code generation for that.
Like others already pointed out, you are using a compiler for your host machine, and you need a compiler for generating code for your target processor instead (a cross compiler). Like #Brooks suggested, you can use a pre-built toolchain, but if you want to roll out your own cross-compiler, libc and binutils, there is a nice tool called Crosstool-NG. It greatly simplifies the process of building a cross-compiler optimized to generate code for a specific processor, so you're not stuck with a generic prebuilt toolchain, which usually builds code for a family of compatible processors (e.g. you could tune the toolchain for generating ASM for your specific target, or floating point code for a hardware FPU which is specific to your processor, instead of using only software floating point routines, which are default to most pre-built toolchains).

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

What GNU C extensions are available that aren't trivial to implement in C99?

How come the Linux kernel can compile only with GCC? What GNU C extensions are really necessary for some projects and why?
Here's a couple gcc extensions the Linux kernel uses:
inline assembly
gcc builtins, such as __builtin_expect,__builtin_constant,__builtin_return_address
function attributes to specify e.g. what registers to use (e.g. __attribute__((regparm(0)),__attribute__((packed, aligned(PAGE_SIZE))) ) )
specific code depending on gcc predefined macros (e.g. workarounds for certain gcc bugs in certain versions)
ranges in switch cases (case 8 ... 15:)
Here's a few more: http://www.ibm.com/developerworks/linux/library/l-gcc-hacks/
Much of these gcc specifics are very architecture dependent, or is made possible because of how gcc is implemented, and probably do not make sense to be specified by a C standard. Others are just convenient extensions to C. As the Linux kernel is built to rely on these extensions, other compilers have to provide the same extensions as gcc to be able to build the kernel.
It's not that Linux had to rely on these features of gcc, e.g. the NetBSD kernel relies very little on gcc specific stuff.
GCC supports Nested Functions, which are not part of the C99 standard. That said, some analysis is required to see how prevalent they actually are within the linux kernel.
Linux kernel was written to be compiled by GCC, so standard compliance was never an objective for kernel developers.
And if GCC offers some useful extensions that make coding easier or compiled kernel smaller or faster, it was a natural choice to use these extensions.
I guess it's not they are really that necessary. Just there are many useful ones and cross-compiler portability is not that much an issue for Linux kernel to forgo the niceties. Not to mention sheer amount of work it would take to get rid of relying on extensions.

Resources