How to do an atomic increment and fetch in C?

How to do an atomic increment and fetch in C? - c

I'm looking for a way to atomically increment a short, and then return that value. I need to do this both in kernel mode and in user mode, so it's in C, under Linux, on Intel 32bit architecture. Unfortunately, due to speed requirements, a mutex lock isn't going to be a good option.
Is there any other way to do this? At this point, it seems like the only option available is to inline some assembly. If that's the case, could someone point me towards the appropriate instructions?

GCC __atomic_* built-ins
As of GCC 4.8, __sync built-ins have been deprecated in favor of the __atomic built-ins: https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html
They implement the C++ memory model, and std::atomic uses them internally.
The following POSIX threads example fails consistently with ++ on x86-64, and always works with _atomic_fetch_add.
main.c
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
__atomic_fetch_add(&global, 1, __ATOMIC_SEQ_CST);
/* This fails consistently. */
/*global++*/;
}
return NULL;
}
int main(void) {
int i;
pthread_t threads[NUM_THREADS];
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out ./main.c -pthread
./main.out
Disassembly analysis at: How do I start threads in plain C?
Tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28.
C11 _Atomic
In 5.1, the above code works with:
_Atomic int global = 0;
global++;
And C11 threads.h was added in glibc 2.28, which allows you to create threads in pure ANSI C without POSIX, minimal runnable example: How do I start threads in plain C?

GCC supports atomic operations:
gcc atomics

Related

Implement custom stack canary handling without the standard library

I am trying to implement stack canaries manually and without the standard library. Therefore I have created a simple PoC with the help of this guide from the OSDev wiki. The article suggests that a simple implementation must provide the __stack_chk_guard variable and the __stack_chk_fail() handler.
However, when I compile using GCC and provide the -fstack-protector-all flag, the executable does not contain any stack canary check at all. What am I missing to get GCC to include the stack canary logic?
gcc -Wall -nostdlib -nodefaultlibs -fstack-protector-all -g -m64 -o poc main.c customlib.h
main.c
#include "customlib.h"
#define STACK_CHK_GUARD (0xDEADBEEFFFFFFFF & ~0xFF)
uintptr_t __stack_chk_guard = STACK_CHK_GUARD;
__attribute__((noreturn)) void __stack_chk_fail()
{
__exit(123);
while(1);
}
int main()
{
__attribute__((unused)) char buffer[16];
for (size_t index = 0; index < 32; index++)
{
buffer[index] = 'A';
}
return 0;
}
customlib.h
This code is mostly irrelevant and is just necessary so that the program can be compiled and linked correctly.
typedef unsigned long int size_t;
typedef unsigned long int uintptr_t;
size_t __syscall(size_t arg1, size_t arg2, size_t arg3, size_t arg4, size_t arg5, size_t arg6)
{
asm("int $0x80\n"
: "=a"(arg1)
: "a"(arg1), "b"(arg2), "c"(arg3), "d"(arg4), "S"(arg5), "D"(arg6));
return arg1;
}
void _exit(int exit_code)
{
__syscall(1, exit_code, 0, 0, 0, 0);
while(1);
}
extern int main();
void _start()
{
main();
_exit(0);
}
GCC version 10.2.0, Linux 5.10.36-2-MANJARO GNU/Linux

It looks like the Arch gcc package (which the Manjaro package is based on) is turning off -fstack-protector when building without the standard library (Done for Arch bug 64270).
This behavior is apparently also present in Gentoo.
I haven't tried this, but I believe you should be able to dump the GCC specs using gcc -dumpspecs into a file, keeping only the section *cc1_options, removing %{nostdlib|nodefaultlibs|ffreestanding:-fno-stack-protector} from it, and passing it to gcc with gcc -specs=your_spec_file.
Alternately, you can rebuild the gcc package with this patch removed.

Linker can't find semaphore functions

I'm trying to make a C programm, that will execute subprocesses, which will be interact using semaphore.
Then I compile code, gcc throw referencing error - because it doesn't know about functions "sem_init", "sem_post" and "sem_wait", even though I include semaphore.h library.
Here's how it look:
Code:
#include <stdio.h>
#include <semaphore.h>
#include <pthread.h>
#include <unistd.h>
#define LETTER_COUNT 26
#define THREADS 2
char letter[LETTER_COUNT] = "aBCDefghiJklMNoPqrsTuvWxyZ";
pthread_t t[THREADS];
sem_t sem[THREADS];
void print_letter(void) {
//print string
}
void* reorder(void* d) {
(void)d;
//do some work
return NULL;
}
void* switch_case(void* d) {
(void)d;
//do some work
return NULL;
}
int main(void) {
int i;
for(i = 0; i < THREADS; i++) {
if(sem_init(&sem[i], 0, 0) == -1) {
perror("sem_init");
return -1;
}
}
pthread_create(&t[0], NULL, reorder, NULL);
pthread_create(&t[1], NULL, switch_case, NULL);
while(1) {
i = (i + 1) % (THREADS - 1);
sem_post(&sem[i]);
sem_wait(&sem[2]);
print_letter();
sleep(1);
}
return 0;
}
Error:
gcc -Wall task4.c -o task4.o
Undefined first referenced
symbol in file
sem_init /var/tmp//cc0i56ka.o
sem_post /var/tmp//cc0i56ka.o
sem_wait /var/tmp//cc0i56ka.o
ld: fatal: symbol referencing errors. No output written to task4.o
collect2: ld returned 1 exit status
I'm trying to find some information about this problem, but I can't find any working solutions. Maybe I should use some compilation flag (like -lsocket)?

As per man sem_init (and friends)
gcc -Wall task4.c -o task4.o -lpthread
On some system, the 'librt' shared library is built against shared libpthread, and referencing -lrt will imply -lpthread. However the man page indicate the proper command to link is to use -pthread, see below. Note that -pthread will invoke MT semantics, as needed, usually -lpthread, but other libraries, flags or #defines. For example, on GCC/Mint19, it will define -D_REENTRANT.
From man sem_init
AME
sem_init - initialize an unnamed semaphore
SYNOPSIS
#include
int sem_init(sem_t *sem, int pshared, unsigned int value);
Link with -lpthread.
From man gcc
Options Controlling the Preprocessor
-pthread
Define additional macros required for using the POSIX threads library. You should use this option consistently for both compilation
and linking. This option is supported on GNU/Linux targets, most other Unix derivatives, and also on x86 Cygwin and MinGW targets.
Options for Linking
-pthread
Link with the POSIX threads library. This option is supported on GNU/Linux targets, most other Unix derivatives, and also on x86
Cygwin and MinGW targets. On some targets this option also sets flags for the preprocessor, so it should be used consistently for both
compilation and linking.

Use openMP only when an argument is passed to the program

Is there a good way to use OpenMP to parallelize a for-loop, only if an -omp argument is passed to the program?
This seems not possible, since #pragma omp parallel for is a preprocessor directive and thus evaluated even before compile time and of course it is only certain if the argument is passed to the program at runtime.
At the moment I am using a very ugly solution to achieve this, which leads to an enormous duplication of code.
if(ompDefined) {
#pragma omp parallel for
for(...)
...
}
else {
for(...)
...
}

I think what you are looking for can be solved using a CPU dispatcher technique.
For benchmarking OpenMP code vs. non-OpenMP code you can create different object files from the same source code like this
//foo.c
#ifdef _OPENMP
double foo_omp() {
#else
double foo() {
#endif
double sum = 0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<1000000000; i++) sum += i%10;
return sum;
}
Compile like this
gcc -O3 -c foo.c
gcc -O3 -fopenmp -c foo.c -o foo_omp.o
This creates two object files foo.o and foo_omp.o. Then you can call one of these functions like this
//bar.c
#include <stdio.h>
double foo();
double foo_omp();
double (*fp)();
int main(int argc, char *argv[]) {
if(argc>1) {
fp = foo_omp;
}
else {
fp = foo;
}
double sum = fp();
printf("sum %e\n", sum);
}
Compile and link like this
gcc -O3 -fopenmp bar.c foo.o foo_omp.o
Then I time the code like this
time ./a.out -omp
time ./a.out
and the first case takes about 0.4 s and the second case about 1.2 s on my system with 4 cores/8 hardware threads.
Here is a solution which only needs a single source file
#include <stdio.h>
typedef double foo_type();
foo_type foo, foo_omp, *fp;
#ifdef _OPENMP
#define FUNCNAME foo_omp
#else
#define FUNCNAME foo
#endif
double FUNCNAME () {
double sum = 0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<1000000000; i++) sum += i%10;
return sum;
}
#ifdef _OPENMP
int main(int argc, char *argv[]) {
if(argc>1) {
fp = foo_omp;
}
else {
fp = foo;
}
double sum = fp();
printf("sum %e\n", sum);
}
#endif
Compile like this
gcc -O3 -c foo.c
gcc -O3 -fopenmp foo.c foo.o

You can set the number of threads at run-time by calling omp_set_num_threads:
#include <omp.h>
int main()
{
int threads = 1;
#ifdef _OPENMP
omp_set_num_threads(threads);
#endif
#pragma omp parallel for
for(...)
{
...
}
}
This isn't quite the same as disabling OpenMP, but it will stop it running calculations in parallel. I've found it's always a good idea to set this using a command line switch (you can implement this using GNU getopt or Boost.ProgramOptions). This allows you to easily run single-threaded and multi-threaded tests on the same code.
As Vladimir F pointed out in the comments, you can also set the number of threads by setting the environment variable OMP_NUM_THREADS before executing your program:
gcc -Wall -Werror -pedantic -O3 -fopenmp -o test test.c
OMP_NUM_THREADS=1
./test
unset OMP_NUM_THREADS
Finally, you can disable OpenMP at compile-time by not providing GCC with the -fopenmp option. However, you will need to put preprocessor guards around any lines in your code that require OpenMP to be enabled (see above). If you want to use some functions included in the OpenMP library without actually enabling the OpenMP pragmas you can simply link against the OpenMP library by replacing the -fopenmp option with -lgomp.

One solution would be to use the preprocessor to ignore the pragma statement if you do not pass an additional flag to the compiler.
For example in your code you might have:
#ifdef MP_ENABLED
#pragma omp parallel for
#endif
for(...)
...
and then when you compile you can pass a flag to the compiler to define the MP_ENABLED macro. In the case of GCC (and Clang) you would pass -DMP_ENABLED.
You then might compile with gcc as
gcc SOME_SOURCE.c -I SOME_INCLUDE.h -lomp -DMP_ENABLED -o SOME_OUTPUT
then when you want to disable the parallelism you can make a minor tweek to the compile command by dropping -DMP_ENABLED.
gcc SOME_SOURCE.c -I SOME_INCLUDE.h -lomp -DMP_ENABLED -o SOME_OUTPUT
This causes the macro to be undefined which leads to the preprocessor ignoring the pragma.
You could also use a similar solution using ifndef instead depending on whether you consider the parallel behavior the default or not.
Edit: As noted by some comments, inclusion of OMP lib defines some macros such as _OPENMP which you could use in place of your own user-defined macros. That looks to be a superior solution, but the difference in effort is reasonably small.

OpenMP in C using GCC in Windows to create DLL

I've recently started to play around with OpenMP and like it very much.
I am a just-for-fun Classic-VB programmer and like coding functions for my VB programs in C. As such, I use Windows 7 x64 and GCC 4.7.2.
I usually set up all my C functions in one large C file and then compile a DLL out of it. Now I would like to use OpenMP in my DLL.
First of all, I set up a simple example and compiled an exe file from it:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n = 520000;
int i;
int a[n];
int NumThreads;
omp_set_num_threads(4);
#pragma omp parallel for
for (i = 0; i < n; i++)
{
a[i] = 2 * i;
NumThreads = omp_get_num_threads();
}
printf("Value = %d.\n", a[77]);
printf("Number of threads = %d.", NumThreads);
return(0);
}
I compile that using gcc -fopenmp !MyC.c -o !MyC.exe and it works like a charm.
However, when I try to use OpenMP in my DLL, it fails. For example, I set up this function:
__declspec(dllexport) int __stdcall TestAdd3i(struct SAFEARRAY **InArr1, struct SAFEARRAY **InArr2, struct SAFEARRAY **OutArr) //OpenMP Test
{
int LengthArr;
int i;
int *InArrElements1;
int *InArrElements2;
int *OutArrElements;
LengthArr = (*InArr1)->rgsabound[0].cElements;
InArrElements1 = (int*) (**InArr1).pvData;
InArrElements2 = (int*) (**InArr2).pvData;
OutArrElements = (int*) (**OutArr).pvData;
omp_set_num_threads(4);
#pragma omp parallel for private(i)
for (i = 0; i < LengthArr; i++)
{
OutArrElements[i] = InArrElements1[i] + InArrElements2[i];
}
return(omp_get_num_threads());
}
The structs are defined, of course. I compile that using
gcc -fopenmp -c -DBUILD_DLL dll.c -o dll.o
gcc -fopenmp -shared -o mydll.dll dll.o -lgomp -Wl,--add-stdcall-alias
The compiler and linker do not complain (not even warnings come up) and the dll file is actually being built. But as I try to call the function from within VB, the VB compiler claims the the DLL file could not be found (run-time error 53). The strange thing about that is that as soon as one single OpenMP "command" is present inside the .c file, the VB compiler claims a missing DLL even if I call a function that does not even contain a single line of OpenMP code. When I comment all OpenMP stuff out, the function works as expected, but doesn't use OpenMP for parallelization, of course.
What is wrong here? Any help appreciated, thanks in advance! :-)

The problem most probably in this case is LD_LIBRARY_PATH is not set . You must use set LD_LIBRARY_PATH to the path that contains the dll or the system will not be able to find it and hence complains about the same

Operations from atomic.h seem to be non-atomic

The following code produces random values for both n and v. It's not surprising that n is random without being properly protected. But it is supposed that v should finally be 0. Is there anything wrong in my code? Or could anyone explain this for me? Thanks.
I'm working on a 4-core server of x86 architecture. The uname is as follows.
Linux 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
#include <stdio.h>
#include <pthread.h>
#include <asm-x86_64/atomic.h>
int n = 0;
atomic_t v;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
#define LOOP 10000
void* foo(void *p)
{
int i = 0;
for(i = 0; i < LOOP; i++) {
// pthread_mutex_lock(&mutex);
++n;
--n;
atomic_inc(&v);
atomic_dec(&v);
// pthread_mutex_unlock(&mutex);
}
return NULL;
}
#define COUNT 50
int main(int argc, char **argv)
{
int i;
pthread_t pids[COUNT];
pthread_attr_t attr;
pthread_attr_init(&attr);
atomic_set(&v, 0);
for(i = 0; i < COUNT; i++) {
pthread_create(&pids[i], &attr, foo, NULL);
}
for(i = 0; i < COUNT; i++) {
pthread_join(pids[i], NULL);
}
printf("%d\n", n);
printf("%d\n", v);
return 0;
}

You should use gcc built-ins instead (see. this) This works fine, and also works with icc.
int a;
__sync_fetch_and_add(&a, 1); // atomic a++
Note that you should be aware of the cache consistency issues when you modify variables without locking.

This old post implies that
It's not obvious that you're supposed to include this kernel header in userspace programs
It's been known to fail to provide atomicity for userspace programs.
So ... Perhaps that's the reason for the problems you're seeing?

Can we get a look at the assembler output of the code (gcc -E, I think). Even thought the uname indicates it's SMP-aware, that doesn't necessarily mean it was compiled with CONFIG_SMP.
Without that, the assembler code output does not have the lock prefix and you can find your cores interfering with one another.
But I would be using the pthread functions anyway since they're portable across more platforms.

Linux kernel atomic.h is not usable from userland and never was. On x86, some of it might work, because x86 is rather synchronization-friendly architecture, but on some platforms it relies heavily on being able to do privileged operations (older arm) or at least on being able to disable preemption (older arm and sparc at least), which is not the case in userland!