Atomic Operations in C on Linux - c

I am trying to port some code I wrote from Mac OS X to Linux and am struggling to find a suitable replacement for the OSX only OSAtomic.h. I found the gcc __sync* family, but I am not sure it will be compatible with the older compiler/kernel I have. I need the code to run on GCC v4.1.2 and kernel 2.6.18.
The particular operations I need are:
Increment
Decrement
Compare and Swap
What is weird is that running locate stdatomic.h on the linux machine finds the header file (in a c++ directory), whereas running the same command on my OSX machine (gcc v4.6.3) returns nothing. What do I have to install to get the stdatomic library, and will it work with gcc v 4.1.2?
As a side note, I can't use any third party libraries.

Well, nothing is there to stop you from using OSAtomic operations on other platforms. The sources for OSAtomic operations for ARM, x86 and PPC are a part of Apple's libc which is opensource. Just make sure you are not using OSSpinLock as that is specific to Mac OS X, but this can be easily replaced by Linux futexes.
See these:
http://opensource.apple.com/source/Libc/Libc-594.1.4/i386/sys/OSAtomic.s
http://opensource.apple.com/source/Libc/Libc-594.1.4/ppc/sys/OSAtomic.s
http://opensource.apple.com/source/Libc/Libc-594.1.4/arm/sys/OSAtomic.s
Alternatively, you can use the sync_* family, which I believe should work on most platforms, which I believe are described here: http://gcc.gnu.org/wiki/Atomic

The OpenPA project provides a portable library of atomic operations under an MIT-style license. This is one I have used before and it is pretty straightforward. The code for your operations would look like
#include "opa_primitives.h"
OPA_int_t my_atomic_int = OPA_INT_T_INITIALIZER(0);
/* increment */
OPA_incr_int(&my_atomic_int);
/* decrement */
OPA_decr_int(&my_atomic_int);
/* compare and swap */
old = OPA_cas_int(&my_atomic_int, expected, new);
It also contains fine-grained memory barriers (i.e. read, write, and read/write) instead of just a full memory fence.
The main header file has a comment showing the operations that are available in the library.

GCC atomic intrinsics have been available since GCC 4.0.1.
There is nothing stopping you building GCC 4.7 or Clang with GCC 4.1.2 and then getting all the newer features such as C11 atomics.
There are many locations you can find BSD licensed assembler implementations of atomics as a last resort.

Related

OpenCL 1.2 compiling kernel binary using LLVM

Say I have the OpenCL kernel,
/* Header to make Clang compatible with OpenCL */
/* Test kernel */
__kernel void test(long K, const global float *A, global float *b)
{
for (long i=0; i<K; i++)
for (long j=0; j<K; j++)
b[i] = 1.5f * A[K * i + j];
}
I'm trying to figure out how to compile this to a binary which can be loaded into OpenCL using the clCreateProgramWithBinary command.
I'm on a Mac (Intel GPU), and thus I'm limited to OpenCL 1.2. I've tried a number of different variations on the command,
clang -cc1 -triple spir test.cl -O3 -emit-llvm-bc -o test.bc -cl-std=cl1.2
but the binary always fails when I try to build the program. I'm at my wits' end with this, it's all so confusing and poorly documented.
The performance of the above test function can, in regular C, be significantly improved by applying the standard LLVM compiler optimization flag -O3. My understanding is that this optimization flag some how takes advantage of the contiguous memory access pattern of the inner loop to improve performance. I'd be more than happy to listen to anyone who wants to fill in the details on this.
I'm also wondering how I can first convert to SPIR code, and then convert that to a buildable binary. Eventually I would like to find a way to apply the -O3 compiler optimizations to my kernel, even if I have to manually modify the SPIR (as diffiult as that will be).
I've also gotten the SPIRV-LLVM-Translator tool working (as far as I can tell), and ran,
./llvm-spirv test.bc -o test.spv
and this binary fails to load at the clCreateProgramWithBinary step, I can't even get to the build step.
Possibly SPIRV doesn't work with OpenCL 1.2, and I have to use clCreateProgramWithIL, which unfortunately doesn't exist for OpenCL 1.2. It's difficult to say for sure why it doesn't work.
Please see my previous question here for some more context on this problem.
I don't believe there's any standardised bitcode file format that's available across implementations, at least at the OpenCL 1.x level.
As you're talking specifically about macOS, have you investigated Apple's openclc compiler? This is also what Xcode invokes when you compile a .cl file as part of a target. The compiler is located in /System/Library/Frameworks/OpenCL.framework/Libraries/openclc; it does have comprehensive --help output but that's not a great source for examples on how to use it.
Instead, I recommend you try the OpenCL-in-Xcode tutorial, and inspect the build commands it ends up running:
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html
You'll find it produces bitcode files (.bc) for 4 "architectures": i386, x86_64, "gpu_64", and "gpu_32". It also auto-generates some C code which loads this code by calling gclBuildProgramBinaryAPPLE().
I don't know if you can untangle it further than that but you certainly can ship bitcode which is GPU-independent using this compiler.
I should point out that OpenCL is deprecated on macOS, so if that's the only platform you're targeting, you really should go for Metal Compute instead. It has much better tooling and will be actively supported for longer. For cross-platform projects it might still make sense to use OpenCL even on macOS, although for shipping kernel binaries instead of source, it's likely you'll have to use platform-specific code for loading those anyway.

How do I build newlib for size optimization?

I'm building an arm-eabi-gcc toolchain with Newlib 2.5.0 as the target C library.
The target embedded system would prefer smaller code size over execution speed. How do I configure newlib to favour smaller code size?
The default build does things like produce a version of strstr that is over 1KB in code size.
There is fat in Newlib that can be addressed with Newlib-nano, which is already part of GCC ARM Embedded, as discussed here (Note the article is from 2014, so the information may be out-dated, but there appears to be Newlib-nano support in the current v6-2017 too).
It removes some features added after C89 that are rarely used in MCU based embedded systems, simplifies complex functions such as formatted I/O, and removes wide character support from non-wide character specific functions. Critically in respect to this question the default build is already size optimised (-Os).
Configure newlib like this:
CFLAGS_FOR_TARGET="-DPREFER_SIZE_OVER_SPEED=1 -Os" \
../newlib-2.5.0/configure
(where I've omitted the rest of the arguments I used for configure, they don't change based on this issue).
There isn't a configure flag, but the configure script reads certain variables from the environment. CFLAGS_FOR_TARGET means flags used when building for the target system.
Not to be confused with CFLAGS_FOR_BUILD , which are flags that would be used if the build system needed to make any auxiliary executables to execute on the build system to help with the build process.
I couldn't find any official documentation on this, but searching the source code, it contained many instances of testing for PREFER_SIZE_OVER_SPEED or __OPTIMIZE_SIZE__. Based on a quick grep, these two flags are almost identical. The only difference was a case in the printf family that if a null pointer is passed for %s, then the former will translate it to (null) but the latter bulls on ahead , probably causing a crash.

Can I use <stdatomic.h> from C11 in Linux driver, or do I must to use Linux functions of memory-barriers?

Can I use #include <stdatomic.h> and atomic_thread_fence() with memory_order from C11 in Linux driver (kernel-space), or do I must to use Linux functions of memory-barriers:
http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt
http://lxr.free-electrons.com/source/Documentation/atomic_ops.txt
Using:
Linux-kernel 2.6.18 or greater
GCC 4.7.2 or greater
If you are writing kernel code, you should do it in C, and do it in the version of C required by the current kernel (shipping gcc). If you want to get it accepted into mainline (or write it as if it were going to get accepted), you should use the Linux functions. You will also find that they work without unexpected surprises, and you will get better debugging help.
Summary: use the linux functions.
EDIT:
It seems not to work.
With or without does not make any difference.
Driver may compile but the lib will fallback to plain integers or NOP
It seems to work.
atomic_store() and atomic_load() provide the threads synchronization I need between the kernel module driver and the userland program.
What is not sure is that if a fallback method is employed, I mean, usage of standard integer and regular assembly instructions by the compiler.
Feel free to give a look in source codes
in functions:
intelfreq.c / Core_Cycle()
and
corefreqd.c / Core_Cycle()

Is executable file generated after compiling in C can be copied and run on any differnet OS(UNIX)?

I am a java programmer, but i have few things to be done in C. So, i started with a simple example as below. If i have compiled it and generate a executable file (hello), can i run the executable file (hello) in any unix platform without the original file (hello.c)? And also is there a way to read the data from executable file means, decompile the executable file to original file (hello.c)?
[oracle#oracleapps test]$ cat hello.c
#include <stdio.h>
int main(){
int i,data =0;
for(i=1;i<=64;i+=1){
data = i*2;
printf("data=%d\n",data);
}
return 0;
}
To compile
gcc -Wall -W -Werror hello.c -o hello
You can run the resulting executable on platforms that are ABI-compatible with the one which you have compiled the executable for. ABI-compatibility basically means that the same physical processor architecture and OS-interfaces (plus calling convention) is used on two (possibly different) OSes. For example, you can run binaries compiled for Linux on a FreeBSD system (with the same processor type), because FreeBSD includes Linux ABI-compatibility. However, it may not be possible to run a binary on all other types of Unices, unless some hackery is done. For example, you can't run Mac OS X applications on linux, however this guy has a solution with which it's possible to use some OS X command line tools (including the GCC compiler itself) on Linux.
Reverse engineering: there are indeed decompilers which aim to generate C code from machine code, but they're not (yet) very powerful. The reason for this is they're by nature extremely hard to write. Machine code patterns have to be recognized, and even then you can't gather all the original info. For example, types of loops, comments and non-static local variable names and most of the types are all gone during the compilation process. For example, if you have a C source file like this:
int main(int argc, char **argv)
{
int i;
for (i = 0; i < 10; i++)
{
printf("I is: %d\n", i); /* Write the value of I */
}
return 0;
}
a C decompiler may be able to reconstruct the following code:
int main(int _var1, void *_var2)
{
int _var3 = 0;
while (_var3 < 10)
{
printf("I is: %d\n", _var3);
_var3 = _var3 + 1;
}
return 0;
}
But this would be a rather advanced decompiler, such as this one.
You can't run the executable on any platform.
You can run the executable on other machines (or this one) without the .c file. If it is the same OS / Distro running on the same hardware.
You can use a de-compiler to disassembler to read the file and view it as assembly or C-- they won't look much like the original c file.
The compiled file is pure machine code (plus some metadata), so it is self-sufficient in that it does not require the source files to be present. The downside? Machine code is both OS and platform-specific. By platform, we usually mean just roughly the CPU's instruction set, i.e. "x86" or "PowerPC", but some code compiled with certain compiler flags may require specific instruction set extensions. The OS dependence is caused not only by different formats for executable files (e.g. ELF as opposed to PE), but also by use of OS-specific services, or common OS services in an OS-specific manner (e.g. system calls). In addition to that, almost all nontrivial code depends on some libraries (a C runtime library at least), so you probably won't be able to run an executable without having the right libraries in compatible versions. So no your executable likely won't run on a 10 year old proprietary UNIX, and may not run on different Linux distributions (though with your program there's a good chance it does, because it likely only depends on glibc).
While machine code can be easily disassembled, the result is very low-level and useless to many people. Decompilation to C is almost always much harder, though there are attempts. The algorithms can be recovered, simply because they have to be encoded in the machine code somehow. Assuming you didn't compile for debugging, it will never recover comments, formatting, variable names, etc. so even a "perfect" decompiler would yield a different C file from the one you put in.
No ... each platform may have a different executable format requirements, different hardware architectures, different executable memory layouts determined by the linker, etc. A compiled executable is "native" to it's currently compiled platform, not other platforms. You can cross-compile for another architecture on your current machine though.
For instance, even though they may have many similarities, a compiled executable on Linux x86 is not guaranteed to run under BSD, depending on it's flavor (i.e., you could probably run it under FreeBSD but typically not OSX's Darwin version of BSD even thought both machines may have the same underlying hardware architecture). You also couldn't compile something on a SGI MIPS machine running IRIX and run it on a Sun SPARC running Solaris.
With C programs, the program is tied to the environment it was compiled for (which is usually the same as the platform it was compiled on, unless you are cross-compiling). You could copy something built for one version of Linux (and a particular hardware architecture) to another machine with the same archtecture running the same version of Linux, and you'll be fine. You can often get away with running it on a related version of Linux. But you won't get x86/64 code to run on a IA32 machine, nor on a PPC machine, nor on a SPARCmachine. You can likely get IA32 code to run on an x86/64 machine, if the basic O/S is sufficiently similar. And you may or may not be able to get something compiled for Debian to run under RedHat or vice versa; it depends on which libraries your program uses.
Java avoids this by having a platform-neutral byte code program that is compiled, and a platform specific JVM (JRE) to run it on each platform. This WORM (Write Once, Run Many) behaviour was a key selling point for Java.
Yes, you can run it on any unix qemu runs on. This is pretty comparable to java programs, which you can run on any unix the jvm runs on...

How to force gcc use int for system calls, not sysenter?

Is it possible to force gcc use int instruction for all the system calls, but not sysenter? This question may sound strange but I have to compile some projects like Python and Firefox this way.
Summary
Thanks to jbcreix, I've downloaded glibc 2.9 source code, and modified the lines in sysdeps/unix/sysv/linux/i386/sysdep.h, to disable use of sysenter by #undef I386_USE_SYSENTER, and it works.
Recompile your C library after replacing sysenter by int 0x80 in syscall.s and link again.
This is not compiler generated code which means you are lucky.
The ultimate origin of the actual syscall is here, as the OP says:
http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/unix/sysv/linux/i386/sysdep.h?root=libc&view=markup
And as I suspected there really was a syscall.S it's just that the glibc sources are a labyrinth.
http://cvs.savannah.gnu.org/viewvc/libc/sysdeps/unix/sysv/linux/i386/syscall.S?root=libc&view=markup
So I think he got it right, asveikau.
You don't modify gcc; you modify libc (or more accurately, recompile it) and the kernel. gcc doesn't emit sysenter instructions; it generates calls to the generic syscall(2) interface, which presents a unified front end to system call entry and exit.
Or, you could use a Pentium; SYSENTER wasn't introduced until PII =]. Note the following KernelTrap link for the interesting methods used by Linux: http://kerneltrap.org/node/531

Resources