Deadlock and race condition in mutex implementation - c

I'm trying to implement a mutex in C using the atomic assembly instruction "bts" to atomically set a bit and return the original value.
However, when I run the following code, it occasionally deadlocks and often shows race conditions:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
typedef unsigned char mutex;
#define MUTEX_FREE 0
#define MUTEX_BUSY 1
// adapted from http://www.acm.uiuc.edu/sigops/roll_your_own/i386/atomic.html
mutex testAndSet(mutex *m) {
int result;
asm ("bts $0, %1; sbbl %0, %0"
:"=r" (result)
:"m" (*m)
:"memory");
return (result & 1);
}
void P(mutex *m) {
// Must use atomic testAndSet to avoid race conditions
while(testAndSet(m) == MUTEX_BUSY)
usleep(10);
}
void V(mutex *m) {
*m = MUTEX_FREE;
}
//////////////
// Test:
//////////////
const int NTHREADS = 100;
const int NINCS = 100;
int counter = 0;
mutex m = MUTEX_FREE;
void criticalSection() {
int i;
for(i=0;i<NINCS;i++) {
P(&m);
counter++;
V(&m);
}
}
int main() {
int i;
pthread_t threads[NTHREADS];
for(i=0; i<NTHREADS; i++) {
pthread_create(&threads[i], NULL, (void *) &criticalSection, NULL);
}
for(i=0; i<NTHREADS; i++) {
pthread_join(threads[i], NULL);
}
printf("got counter=%d, expected=%d\n", counter, NTHREADS*NINCS);
}
The code seems to work if I use the "xchgb" instruction instead of "bts" as follows:
mutex testAndSet(mutex *m) {
unsigned char result = MUTEX_BUSY;
asm ("xchgb %1, %0"
:"=m" (*m), "=r" (result)
:"1" (result)
:"memory");
return result;
}
Where is the race condition in the original code? Shouldn't the "bts" instruction be atomic, guaranteeing thread safety?
Furthermore, is my modified solution actually correct?
(I'm running OS X 10.8 and compiling with gcc.)

Try using the LOCK prefix to lock the memory bus:
asm ("lock bts $0, %1; ...");
The xchg instruction worked because that always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix.

Related

How to solve the dining philosophers problem with only mutexes?

I wrote this program to solve the dining philosophers problem using Dijkstra's algorithm, notice that I'm using an array of booleans (data->locked) instead of an array of binary semaphores.
I'm not sure if this solution is valid (hence the SO question).
Will access to the data->locked array in both test and take_forks functions cause data races? if so is it even possible to solve this problem using Dijkstra's algorithm with only mutexes?
I'm only allowed to use mutexes, no semaphores, no condition variables (it's an assignment).
Example of usage:
./a.out 4 1000 1000
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#define NOT_HUNGRY 1
#define HUNGRY 2
#define EATING 3
#define RIGHT ((i + 1) % data->n)
#define LEFT ((i + data->n - 1) % data->n)
typedef struct s_data
{
int n;
int t_sleep;
int t_eat;
int *state;
bool *locked;
pthread_mutex_t *state_mutex;
} t_data;
typedef struct s_arg
{
t_data *data;
int i;
} t_arg;
int ft_min(int a, int b)
{
if (a < b)
return (a);
return (b);
}
int ft_max(int a, int b)
{
if (a > b)
return (a);
return (b);
}
// if the LEFT and RIGHT threads are not eating
// and thread number i is hungry, change its state to EATING
// and signal to the while loop in `take_forks` to stop blocking.
// if a thread has a state of HUNGRY then it's guaranteed
// to be out of the critical section of `take_forks`.
void test(int i, t_data *data)
{
if (
data->state[i] == HUNGRY
&& data->state[LEFT] != EATING
&& data->state[RIGHT] != EATING
)
{
data->state[i] = EATING;
data->locked[i] = false;
}
}
// set the state of the thread number i to HUNGRY
// and block until the LEFT and RIGHT threads are not EATING
// in which case they will call `test` from `put_forks`
// which will result in breaking the while loop
void take_forks(int i, t_data *data)
{
pthread_mutex_lock(data->state_mutex);
data->locked[i] = true;
data->state[i] = HUNGRY;
test(i, data);
pthread_mutex_unlock(data->state_mutex);
while (data->locked[i]);
}
// set the state of the thread number i to NOT_HUNGRY
// then signal to the LEFT and RIGHT threads
// so they can start eating when their neighbors are not eating
void put_forks(int i, t_data *data)
{
pthread_mutex_lock(data->state_mutex);
data->state[i] = NOT_HUNGRY;
test(LEFT, data);
test(RIGHT, data);
pthread_mutex_unlock(data->state_mutex);
}
void *philosopher(void *_arg)
{
t_arg *arg = _arg;
while (true)
{
printf("%d is thinking\n", arg->i);
take_forks(arg->i, arg->data);
printf("%d is eating\n", arg->i);
usleep(arg->data->t_eat * 1000);
put_forks(arg->i, arg->data);
printf("%d is sleeping\n", arg->i);
usleep(arg->data->t_sleep * 1000);
}
return (NULL);
}
void data_init(t_data *data, pthread_mutex_t *state_mutex, char **argv)
{
int i = 0;
data->n = atoi(argv[1]);
data->t_eat = atoi(argv[2]);
data->t_sleep = atoi(argv[3]);
pthread_mutex_init(state_mutex, NULL);
data->state_mutex = state_mutex;
data->state = malloc(data->n * sizeof(int));
data->locked = malloc(data->n * sizeof(bool));
while (i < data->n)
{
data->state[i] = NOT_HUNGRY;
data->locked[i] = true;
i++;
}
}
int main(int argc, char **argv)
{
pthread_mutex_t state_mutex;
t_data data;
t_arg *args;
pthread_t *threads;
int i;
if (argc != 4)
{
fputs("Error\nInvalid argument count\n", stderr);
return (1);
}
data_init(&data, &state_mutex, argv);
args = malloc(data.n * sizeof(t_arg));
i = 0;
while (i < data.n)
{
args[i].data = &data;
args[i].i = i;
i++;
}
threads = malloc(data.n * sizeof(pthread_t));
i = 0;
while (i < data.n)
{
pthread_create(threads + i, NULL, philosopher, args + i);
i++;
}
i = 0;
while (i < data.n)
pthread_join(threads[i++], NULL);
}
Your spin loop while (data->locked[i]); is a data race; you don't hold the lock while reading it data->locked[i], and so another thread could take the lock and write to that same variable while you are reading it. In fact, you rely on that happening. But this is undefined behavior.
Immediate practical consequences are that the compiler can delete the test (since in the absence of a data race, data->locked[i] could not change between iterations), or delete the loop altogether (since it's now an infinite loop, and nontrivial infinite loops are UB). Of course other undesired outcomes are also possible.
So you have to hold the mutex while testing the flag. If it's false, you should then hold the mutex until you set it true and do your other work; otherwise there is a race where another thread could get it first. If it's true, then drop the mutex, wait a little while, take it again, and retry.
(How long is a "little while", and what work you choose to do in between, are probably things you should test. Depending on what kind of fairness algorithms your pthread implementation uses, you might run into situations where take_forks succeeds in retaking the lock even if put_forks is also waiting to lock it.)
Of course, in a "real" program, you wouldn't do it this way in the first place; you'd use a condition variable.

Some threads never get execution when invoked in large amount

Consider the following program,
static long count = 0;
void thread()
{
printf("%d\n",++count);
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t,NULL,(void *(*)(void*))thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
This program is expected to print 1 to 30000. But it some times prints 29945, 29999, 29959, etc. Why this is happening?
Because count isn't atomic, so you have a race condition both in the increment and in the subsequent print.
The instruction you need is atomic_fetch_add, to increment the counter and avoid the race condition. The example on cppreference illustrates the exact problem you laid out.
Your example can be made to work with just a minor adjustment:
#include <stdio.h>
#include <signal.h>
#include <sys/resource.h>
#include <pthread.h>
#include <stdatomic.h>
static atomic_long count = 1;
void * thread(void *data)
{
printf("%ld\n", atomic_fetch_add(&count, 1));
return NULL;
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t, NULL, thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
I made a handful of other changes, such as fixing the thread function signature and using the correct printf format for printing longs. But the atomic issue is why you weren't printing all the numbers you expected.
Why this is happening?
Because you have a data race (undefined behavior).
In particular, this statement:
printf("%d\n",++count);
modifies a global (shared) variable without any locking. Since the ++ does not atomically increment it, it's quite possible for multiple threads to read the same value (say 1234), increment it, and store the updated value in parallel, resulting in 1235 being printed repeatedly (two or more times), and one or more of the increments being lost.
A typical solution is to either use mutex to avoid the data race, or (rarely) an atomic variable (which guarantees atomic increment). Beware: atomic variables are quite hard to get right. You are not ready to use them yet.

achieve GCC cas function for version 4.1.2 and earlier

My new company project, they want the code run for the 32-bit, the compile server is a CentOS 5.0 with GCC 4.1.1, that was the nightmare.
There are lots of functions using in the project like __sync_fetch_and_add was given in GCC 4.1.2 and later.
I was told can not upgrade GCC version, so I have to make another solution after Googling for several hours.
When I wrote a demo to test, I just got the wrong answer, the code blow want to replace function __sync_fetch_and_add
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static int count = 0;
int compare_and_swap(int* reg, int oldval, int newval)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i = 0;
for(i = 0; i < 2000; ++i) {
compare_and_swap((int *)&count, count, count + 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i = 0; i < 10; ++i){
pthread_create(&id[i], NULL, test_func, NULL);
}
for(i = 0; i < 10; ++i) {
pthread_join(id[i], NULL);
}
//10*2000=20000
printf("%d\n", count);
return 0;
}
Whent I got the wrong result:
[root#centos-linux-7 workspace]# ./asm
17123
[root#centos-linux-7 workspace]# ./asm
14670
[root#centos-linux-7 workspace]# ./asm
14604
[root#centos-linux-7 workspace]# ./asm
13837
[root#centos-linux-7 workspace]# ./asm
14043
[root#centos-linux-7 workspace]# ./asm
16160
[root#centos-linux-7 workspace]# ./asm
15271
[root#centos-linux-7 workspace]# ./asm
15280
[root#centos-linux-7 workspace]# ./asm
15465
[root#centos-linux-7 workspace]# ./asm
16673
I realize in this line
compare_and_swap((int *)&count, count, count + 1);
count + 1 was wrong!
Then how can I implement the same function as __sync_fetch_and_add. The compare_and_swap function works when the third parameter is constant.
By the way, compare_and_swap function is that right? I just Googled for that, not familiar with assembly.
I got despair with this question.
………………………………………………………………………………………………………………………………………………………………………………………………………………………
after seeing the answer below,I use while and got the right answer,but seems confuse more.
here is the code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int sync_add_and_fetch(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i=0;
int result = 0;
for(i=0;i<2000;++i)
{
result = 0;
while(0 == result)
{
result = sync_add_and_fetch((int *)&count, count, 1);
}
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
the answer goes right to 20000,so i think when you use sync_add_and_fetch function,you should goes with a while loop is stupid,so I write like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int compare_and_swap(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void sync_add_and_fetch(int *reg,int oldval,int incre)
{
int ret = 0;
while(0 == ret)
{
ret = compare_and_swap(reg,oldval,incre);
}
}
void *test_func(void *arg)
{
int i=0;
for(i=0;i<2000;++i)
{
sync_add_and_fetch((int *)&count, count, 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
but when i run this code with ./asm after g++ -g -o asm asm.cpp -lpthread.the asm just stuck for more than 5min,see top in another terminal:
3861 root 19 0 102m 888 732 S 400 0.0 2:51.06 asm
I just confused,is this code not the same?
The 64-bit compare_and_swap is wrong as it swaps 64 bits but int is only 32 bits.
compare_and_swap should be used in a loop which retries it until is succeeds.
Your result look right to me. lock cmpxchg succeeds most of the time, but will fail if another core beat you to the punch. You're doing 20k attempts to cmpxchg count+1, not 20k atomic increments.
To write __sync_fetch_and_add with inline asm, you'll want to use lock xadd. It's specifically designed to implement fetch-add.
Implementing other operations, like fetch-or or fetch-and, require a CAS retry loop if you actually need the old value. So you could make a version of the function that doesn't return the old value, and is just a sync-and without the fetch, using lock and with a memory destination. (Compiler builtins can make this optimization based on whether the result is needed or not, but an inline asm implementation doesn't get a chance to choose asm based on that information.)
For efficiency, remember that and, or, add and many other instructions can use immediate operands, so a "re"(src) constraint would be appropriate (not "ri" for int64_t on x86-64, because that would allow immediates too large. https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html). But cmpxchg, xadd, and xchg can't use immediates, of course.
I'd suggest looking at compiler output for modern gcc (e.g. on http://godbolt.org/) for functions using the builtin, to see what compilers do.
But beware that inline asm can compile correctly given one set of surrounding code, but not the way you expect given different code. e.g. if the surrounding code copied a value after using CAS on it (probably unlikely), the compiler might decide to give the asm template two different memory operands for "=m"(*reg) and "m"(*reg), but your asm template assumes they will always be the same address.
IDK if gcc4.1 supports it, but "+m"(*reg) would declare a read/write memory operand. Otherwise perhaps you can use a matching constraint to say that the input is in the same location as an earlier operand, like "0"(*reg). But that might only work for registers, not memory, I didn't check.
"a" (oldval) is a bug: cmpxchg writes EAX on failure.
It's not ok to tell the compiler you leave a reg unmodified, and then write an asm template that does modify it. You will get unpredictable behaviour from stepping on the compiler's toes.
See c inline assembly getting "operand size mismatch" when using cmpxchg for a safe inline-asm wrapper for lock cmpxchg. It's written for gcc6 flag-output, so you'll have to back-port that and maybe a few other syntax details to crusty old gcc4.1.
That answer also addresses returning the old value so it doesn't have to be separately loaded.
(Using ancient gcc4.1 sounds like a bad idea to me, especially for writing multi-threaded code. So much room for error from porting working code with __sync builtins to hand-rolled asm. The risks of using a newer compiler, like stable gcc5.5 if not gcc7.4, are different but probably smaller.)
If you're going to rewrite code using __sync builtins, the sane thing would be to rewrite it using C11 stdatomic.h, or GNU C's more modern __atomic builtins that are intended to replace __sync.
The Linux kernel does successfully use inline asm for hand-rolled atomics, though, so it's certainly possible.
If you truly are in such a predicament, I would start with the following header file:
#ifndef SYNC_H
#define SYNC_H
#if defined(__x86_64__) || defined(__i386__)
static inline int sync_val_compare_and_swap_int(int *ptr, int oldval, int newval)
{
__asm__ __volatile__( "lock cmpxchgl %[newval], %[ptr]"
: "+a" (oldval), [ptr] "+m" (*ptr)
: [newval] "r" (newval)
: "memory" );
return oldval;
}
static inline int sync_fetch_and_add_int(int *ptr, int val)
{
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return val;
}
static inline int sync_add_and_fetch_int(int *ptr, int val)
{
const int old = val;
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return old + val;
}
static inline int sync_fetch_and_sub_int(int *ptr, int val) { return sync_fetch_and_add_int(ptr, -val); }
static inline int sync_sub_and_fetch_int(int *ptr, int val) { return sync_add_and_fetch_int(ptr, -val); }
/* Memory barrier */
static inline void sync_synchronize(void) { __asm__ __volatile__( "mfence" ::: "memory"); }
#else
#error Unsupported architecture.
#endif
#endif /* SYNC_H */
The same extended inline assembly works for both x86 and x86-64. Only the int type is implemented, and you do need to replace possible __sync_synchronize() calls with sync_synchronize(), and each __sync_...() call with sync_..._int().
To test, you can use e.g.
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include "sync.h"
#define THREADS 16
#define PERTHREAD 8000
void *test_func1(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_add_and_fetch_int(sum, n + 1);
return NULL;
}
void *test_func2(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_fetch_and_add_int(sum, n + 1);
return NULL;
}
void *test_func3(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
int oldval, curval, newval;
while (n-->0) {
curval = *sum;
do {
oldval = curval;
newval = curval + n + 1;
} while ((curval = sync_val_compare_and_swap_int(sum, oldval, newval)) != oldval);
}
return NULL;
}
static void *(*worker[3])(void *) = { test_func1, test_func2, test_func3 };
int main(void)
{
pthread_t thread[THREADS];
pthread_attr_t attrs;
int sum = 0;
int t, result;
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 65536);
for (t = 0; t < THREADS; t++) {
result = pthread_create(thread + t, &attrs, worker[t % 3], &sum);
if (result) {
fprintf(stderr, "Failed to create thread %d of %d: %s.\n", t+1, THREADS, strerror(errno));
exit(EXIT_FAILURE);
}
}
pthread_attr_destroy(&attrs);
for (t = 0; t < THREADS; t++)
pthread_join(thread[t], NULL);
t = THREADS * PERTHREAD * (PERTHREAD + 1) / 2;
if (sum == t)
printf("sum = %d (as expected)\n", sum);
else
printf("sum = %d (expected %d)\n", sum, t);
return EXIT_SUCCESS;
}
Unfortunately, I don't have an ancient version of GCC to test, so this has only been tested with GCC 5.4.0 and GCC-4.9.3 for x86 and x86-64 (using -O2) on Linux.
If you find any bugs or issues in the above, please let me know in a comment so I can verify and fix as needed.

How to make thread safe program?

On a 64-bit architecture pc, the next program should return the result 1.350948.
But it is not thread safe and every time I run it gives (obviously) a different result.
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <pthread.h>
const unsigned int ndiv = 1000;
double res = 0;
struct xval{
double x;
};
// Integrate exp(x^2 + y^2) over the unit circle on the
// first quadrant.
void* sum_function(void*);
void* sum_function(void* args){
unsigned int j;
double y = 0;
double localres = 0;
double x = ((struct xval*)args)->x;
for(j = 0; (x*x)+(y*y) < 1; y = (++j)*(1/(double)ndiv)){
localres += exp((x*x)+(y*y));
}
// Globla variable:
res += (localres/(double)(ndiv*ndiv));
// This is not thread safe!
// mutex? futex? lock? semaphore? other?
}
int main(void){
unsigned int i;
double x = 0;
pthread_t thr[ndiv];
struct xval* xvarray;
if((xvarray = calloc(ndiv, sizeof(struct xval))) == NULL){
exit(EXIT_FAILURE);
}
for(i = 0; x < 1; x = (++i)*(1/(double)ndiv)){
xvarray[i].x = x;
pthread_create(&thr[i], NULL, &sum_function, &xvarray[i]);
// Should check return value.
}
for(i = 0; i < ndiv; i++){
pthread_join(thr[i], NULL);
// If
// pthread_join(thr[i], &retval);
// res += *((double*)retval) <-?
// there would be no problem.
}
printf("The integral of exp(x^2 + y^2) over the unit circle on\n\
the first quadrant is: %f\n", res);
return 0;
}
How can it be thread safe?
NOTE: I know that 1000 threads is not a good way to solve this problem, but I really really want to know how to write thread-safe c programs.
Compile the above program with
gcc ./integral0.c -lpthread -lm -o integral
pthread_mutex_lock(&my_mutex);
// code to make thread safe
pthread_mutex_unlock(&my_mutex);
Declare my_mutex either as a global variable like pthread_mutex_t my_mutex;. Or initialize in code using pthread_mutex_t my_mutex; pthread_mutex_init(&my_mutex, NULL);. Also don't forget to include #include <pthread.h> and link your program with -lpthread when compiling.
The question (in a comment in the code):
// mutex? futex? lock? semaphore? other?
Answer: mutex.
See pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock.

How to modify structure elements atomically without using locks in C?

I would like to modify some elements of a structure atomically.
My current implementation uses mutexes to protect the critical code, and can be seen below.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <pthread.h>
pthread_mutex_t thread_mutex = PTHREAD_MUTEX_INITIALIZER;
#define ITER 100000
typedef struct global_status {
int32_t context_delta;
uint32_t global_access_count;
} global_status_t;
global_status_t g_status;
void *context0(void *ptr)
{
unsigned int iter = ITER;
while (iter--) {
wait_event_from_device0();
pthread_mutex_lock(&thread_mutex);
g_status.context_delta++;
g_status.global_access_count++;
pthread_mutex_unlock(&thread_mutex);
}
return NULL;
}
void *context1(void *ptr)
{
unsigned int iter = ITER;
while (iter--) {
wait_event_from_device1();
pthread_mutex_lock(&thread_mutex);
g_status.context_delta--;
g_status.global_access_count++;
pthread_mutex_unlock(&thread_mutex);
}
return NULL;
}
int main(int argc, char **argv)
{
pthread_t tid0, tid1;
int iret;
if ((iret = pthread_create(&tid0, NULL, context0, NULL))) {
fprintf(stderr, "context0 creation error!\n");
return EXIT_FAILURE;
}
if ((iret = pthread_create(&tid1, NULL, context1, NULL))) {
fprintf(stderr, "context1 creation error!\n");
return EXIT_FAILURE;
}
pthread_join(tid0, NULL);
pthread_join(tid1, NULL);
printf("%d, %d\n", g_status.context_delta, g_status.global_access_count);
return 0;
}
I am planning to port this code into an RTOS which does not support posix, and I would like to do this operation atomically without using mutexes or disabling/enabling interrupts.
How can I do this operation?
Is it possible by using 'atomic compare and swap function' (CAS)?
Seems like in your example you have two threads servicing to different devices. You maybe able to do away with locking completely using a per-device structure. The global will be the aggregate of all per-device statistics. If you do need locks you can use CAS, LL/SC or any supported underlying atomic construct.
What i do is create a union with all the fields I want to change at the same time. like this:
union {
struct {
int m_field1;
unsigned short m_field2 : 2,
m_field3 : 1;
BYTE m_field4;
}
unsigned long long m_n64;
TData(const TData& r) { m_n64 = r.m_n64; }
} TData;
You embed unions like that inside your larger struct like this:
struct {
...
volatile TData m_Data;
...
} TBiggerStruct;
Then i do something like this:
while (1) {
TData Old = BiggerSharedStruct.m_Data, New = Old;
New.field1++;
New.field4--;
if (CAS(&SharedData.m_n64, Old.m_n64, New.m_n64))
break; // success
}
I do a lot of packing of fields that I want to change at the same time into the smallest possible 16, 32, or 64 bit structure. I think 128 bit stuff on intel is not as fast as the 64 bit stuff, so I avoid it. I haven't benchmarked it in awhile so I could be wrong on that.

Resources