I just implemented a reader-writer lock in C. I want to limit the number of readers, so I use 'num' to count it. I'm not sure whether this implementation has some potential data race or deadlock conditions. So could you help me figuring them out please?
Another question is can I remove the 'spin_lock' in struct _rwlock in someway? Thanks!
#define MAX_READER 16;
typedef _rwlock *rwlock;
struct _rwlock{
spin_lock lk;
unint32_t num;
};
void wr_lock(rwlock lock){
while (1){
if (lock->num > 0) continue;
lock(lock->lk);
lock->num += MAX_READER;
return;
}
}
void wr_unlock(rwlock lock){
lock->num -= MAX_READER;
unlock(lock->lk);
}
void rd_lock(rwlock lock){
while (1){
if (lock->num >= MAX_READER) continue;
atom_inc(num);
return;
}
}
void rd_unlock(rwlock lock){
atom_dec(num);
}
Short answer: Yes, there are severe issues here. I don't know what synchronization library you are using, but you are not protecting access to shared data and you will waste tons of CPU cycles on your loops in rd_lock() and wr_lock(). Spin locks should be avoided in virtually all cases (there are exceptions though).
In wr_lock (and similar in rd_lock):
while (1){
if (lock->num > 0) continue;
This is wrong. If you don't somehow synchronize, you aren't guaranteed to see changes from other threads. If this were the only problem you could perhaps acquire the lock and then check the count.
In rd_lock:
atom_inc(num);
This doesn't play well with the non-atomic += and -= in the writer functions, because it can interrupt them. Same for the decrement in rd_unlock.
rd_lock can return while a thread holds the lock as writer -- this isn't the usual semantics of a reader-writer lock, and it means that whatever your rw-lock is supposed to protect, it will not protect it.
If you are using pthreads, then it already has a rwlock. On Windows consider SRWlocks (never used 'em myself). For portable code, build your rwlock using a condition variable (or maybe two -- one for readers and one for writers). That is, insofar as multi-threaded code in C can be portable. C11 has a condition variable, and if there's a pre-C11 threads implementation out there that doesn't, I don't want to have to use it ;-)
Related
#include <stdio.h>
#include <pthread.h>
long mails = 0;
int lock = 0;
void *routine()
{
printf("Thread Start\n");
for (long i = 0; i < 100000; i++)
{
while (lock)
{
}
lock = 1;
mails++;
lock = 0;
}
printf("Thread End\n");
}
int main(int argc, int *argv[])
{
pthread_t p1, p2;
if (pthread_create(&p1, NULL, &routine, NULL) != 0)
{
return 1;
}
if (pthread_create(&p2, NULL, &routine, NULL) != 0)
{
return 2;
}
if (pthread_join(p1, NULL) != 0)
{
return 3;
}
if (pthread_join(p2, NULL) != 0)
{
return 4;
}
printf("Number of mails: %ld \n", mails);
return 0;
}
In the above code each thread runs a for loop to increase the value
of mails by 100000.
To avoid race condition is used lock variable
along with while loop.
Using while loop in routine function does not
help to avoid race condition and give correct output for mails
variable.
In C, the compiler can safely assume a (global) variable is not modified by other threads unless in few cases (eg. volatile variable, atomic accesses). This means the compiler can assume lock is not modified and while (lock) {} can be replaced with an infinite loop. In fact, this kind of loop cause an undefined behaviour since it does not have any visible effect. This means the compiler can remove it (or generate a wrong code). The compiler can also remove the lock = 1 statement since it is followed by lock = 0. The resulting code is bogus. Note that even if the compiler would generate a correct code, some processor (eg. AFAIK ARM and PowerPC) can reorder instructions resulting in a bogus behaviour.
To make sure accesses between multiple threads are correct, you need at least atomic accesses on lock. The atomic access should be combined with proper memory barriers for relaxed atomic accesses. The thing is while (lock) {} will result in a spin lock. Spin locks are known to be a pretty bad solution in many cases unless you really know what you are doing and all the consequence (in doubt, don't use them).
Generally, it is better to uses mutexes, semaphores and wait conditions in this case. Mutexes are generally implemented using an atomic boolean flag internally (with right memory barriers so you do not need to care about that). When the flag is mark as locked, an OS sleeping function is called. The sleeping function wake up when the lock has been released by another thread. This is possible since the thread releasing a lock can send a wake up signal. For more information about this, please read this. In old C, you can use pthread for that. Since C11, you can do that directly using this standard API. For pthread, it is here (do not forget the initialization).
If you really want a spinlock, you need something like:
#include <stdatomic.h>
atomic_flag lock = ATOMIC_FLAG_INIT;
void *routine()
{
printf("Thread Start\n");
for (long i = 0; i < 100000; i++)
{
while (atomic_flag_test_and_set(&lock)) {}
mails++;
atomic_flag_clear(&lock);
}
printf("Thread End\n");
}
However, since you are already using pthreads, you're better off using a pthread_mutex
Jérôme Richard told you about ways in which the compiler could optimize the sense out of your code, but even if you turned all the optimizations off, you still would be left with a race condition. You wrote
while (lock) { }
lock=1;
...critical section...
lock=0;
The problem with that is, suppose lock==0. Two threads racing toward that critical section at the same time could both test lock, and they could both find that lock==0. Then they both would set lock=1, and they both would enter the critical section...
...at the same time.
In order to implement a spin lock,* you need some way for one thread to prevent other threads from accessing the lock variable in between when the first thread tests it, and when the first thread sets it. You need an atomic (i.e., indivisible) "test and set" operation.
Most computer architectures have some kind of specialized op-code that does what you want. It has names like "test and set," "compare and exchange," "load-linked and store-conditional," etc. Chris Dodd's answer shows you how to use a standard C library function that does the right thing on whatever CPU you happen to be using...
...But don't forget what Jérôme said.*
* Jérôme told you that spin locks are a bad idea.
Each thread of my program has its own log file. In my SIGHUP handler I want to notify those threads that when a new log message arrives, they need to reopen their log files.
I want a lock-free solution based purely on flags and counters. (I do have a thread-local context structure for another purpose, so I can add new fields there).
If there was just one logging thread, I would do:
static int need_reopen = 0;
void sighancont(int signo)
...
case SIGHUP:
need_reopen = 1;
break;
...
}
void log(char *msg) {
if (need_reopen) {
need_reopen = 0;
reopen_log();
}
...
}
Of course, if there are multiple logging threads, a simple flag won't do. I'm thinking of something like this:
static volatile int reopen_counter = 0;
void sighancont(int signo)
...
case SIGHUP:
__sync_fetch_and_add(&reopen_counter, 1);
break;
...
}
void log(struct ctx_st *ctx, char *msg) {
int c = reopen_counter;
if (ctx->reopen_counter != c) {
ctx->reopen_counter = c;
reopen_log();
}
...
}
This way the logging threads are supposed to catch-up with the global counter. If the program receives SIGHUP multiple times, log files will be reopened only once.
I see the only way to break this - to send SIGHUP ~4 billion times.
Is there a better (but still simple) algorithm, e.g. with reference counting?
Your solution is simple and efficient. This is kind of a seqlock.
A few notes, to clear possible confusion from comments:
There's no "atomic variable" but atomic instruction. std::atomic, and friends, are just syntactic sugar around atomic ops - you're perfectly ok there.
The counter doesn't have to be volatile, but the accesses have to be. When you write atomic_read(x) you actually say *(volatile int*)&x.
The volatile qualifier causes all accesses to the variable to be done from memory, while you don't necessarily need that.
But, here as well, you're perfectly ok, since you read the variable into a local.
You can update the counter non-atomically if this is the one and only writer (don't forget to make it atomic_write if you remove the volatile). This would be a very small performance improvement.
The only cost here is in the log threads that have to pay for main memory access after the counter is updated. You should expect 200 cycles or so (x2 on other NUMA node)
As for threads, I have mutex and conditionals so I could manipulate them easily.
However, if I create two processes by fork(), how could I make them alternating?
Or, is there any way to create a "critical section" for processes?
I intended to make a program that prints "r" and "w" alternatively, here is the code.
#include <stdio.h>
#include <stdlib.h>
int pipe_1[2];
int flag = 0;
void r();
void w();
int main() {
pipe(pipe_1);
if(fork())
r();
else
w();
}
void r() {
int count = 0;
while(1) {
printf("%d \n", flag);
if (count == 10)
exit(0);
if(flag == 0) {
puts("r");
flag = 1;
count++;
while(flag == 1)
;
}
}
}
void w() {
while(1) {
if(flag == 1) {
puts("w");
flag = 0;
while(flag == 0)
;
}
}
}
The out put is only:
0
r
Then it seems to enter a infinite loop.
What's the problem?
And what's the right way to make alternating processes?
Thanks.
This may be overwhelming, but there are TONS of primitives you could use. See here for a list.
http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html
Glancing at the list, just about all of those could be used. Some are more like traditional pthread synchronization primitives, others are higher-level, but can still be used for synchronization.
For example, you could just open a TCP socket between the two and send messages when it's the other side's turn. Maybe with an incrementing number.
Something perhaps more traditional would be semaphores:
http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html#semaphores
Also, this assumes a modern unix-like platform. Windows is likely very different.
It looks like you have a pipe already, so you can use that to have each side send a message to the other after it's done its print. The other side would do a blocking read, then return when the message was sent, do it's print, send a message back, and go back to a blocking read.
They are separate processes, so each has it's own flag; r changing its doesn't affect w's.
In order for two processes to communicate with each other without sharing the same address space (like threads do), they must use Inter-Process Communication means (aka IPC). Some of the IPC mechanisms are: shared memory, semaphore, pipes, sockets, message queues and more. Most of the time, IPC mechanisms are operating system specific. However, many ideas are general enough so it is possible to come up with a portable implementations, which Boost project did as part of Boost.Interprocess library. What I think you should take a look at first is Synchronization Mechanisms section. Note, however, that this is a C++ library. I am not aware of any C library that is as good as Boost.
Hope it helps. Good Luck!
Let's imagine that I have a few worker threads such as follows:
while (1) {
do_something();
if (flag_isset())
do_something_else();
}
We have a couple of helper functions for checking and setting a flag:
void flag_set() { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int flag_isset() { return global_flag; }
Thus the threads keep calling do_something() in a busy-loop and in case some other thread sets global_flag the thread also calls do_something_else() (which could for example output progress or debugging information when requested by setting the flag from another thread).
My question is: Do I need to do something special to synchronize access to the global_flag? If yes, what exactly is the minimum work to do the synchronization in a portable way?
I have tried to figure this out by reading many articles but I am still not quite sure of the correct answer... I think it is one of the following:
A: No need to synchronize because setting or clearing the flag does not create race conditions:
We just need to define the flag as volatile to make sure that it is really read from the shared memory every time it is being checked:
volatile int global_flag;
It might not propagate to other CPU cores immediately but will sooner or later, guaranteed.
B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:
Setting the shared flag in one CPU core does not necessarily make it seen by another core. We need to use a mutex to make sure that flag changes are always propagated by invalidating the corresponding cache lines on other CPUs. The code becomes as follows:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset()
{
int rc;
pthread_mutex_lock(flag_mutex);
rc = global_flag;
pthread_mutex_unlock(flag_mutex);
return rc;
}
C: Synchronization is needed to make sure that changes to the flag are propagated between threads:
This is the same as B but instead of using a mutex on both sides (reader & writer) we set it in only in the writing side. Because the logic does not require synchronization. we just need to synchronize (invalidate other caches) when the flag is changed:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset() { return global_flag; }
This would avoid continuously locking and unlocking the mutex when we know that the flag is rarely changed. We are just using a side-effect of Pthreads mutexes to make sure that the change is propagated.
So, which one?
I think A and B are the obvious choices, B being safer. But how about C?
If C is ok, is there some other way of forcing the flag change to be visible on all CPUs?
There is one somewhat related question: Does guarding a variable with a pthread mutex guarantee it's also not cached? ...but it does not really answer this.
The 'minimum amount of work' is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:
void flag_set() {
global_flag = 1;
__sync_synchronize(global_flag);
}
void flag_clear() {
global_flag = 0;
__sync_synchronize(global_flag);
}
int flag_isset() {
int val;
// Prevent the read from migrating backwards
__sync_synchronize(global_flag);
val = global_flag;
// and prevent it from being propagated forwards as well
__sync_synchronize(global_flag);
return val;
}
These memory barriers accomplish two important goals:
They force a compiler flush. Consider a loop like the following:
for (int i = 0; i < 1000000000; i++) {
flag_set(); // assume this is inlined
local_counter += i;
}
Without a barrier, a compiler might choose to optimize this to:
for (int i = 0; i < 1000000000; i++) {
local_counter += i;
}
flag_set();
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag - most CPU architectures will eventually see a flag that's set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
// start with only flag A set
flag_set_B();
flag_clear_A();
And on thread B:
a = flag_isset_A();
b = flag_isset_B();
assert(a || b); // can be false!
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.
Note also that on some CPUs, it's possible to use 'acquire-release' barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.
A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.
You must not cause data race cases. It is undefined behavior and the compiler is allowed to do anything and everything it pleases.
A humorous blog on the topic: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Case 1: There is no synchronization on the flag, so anything is allowed to happen. For example, the compiler is allowed to turn
flag_set();
while(weArentBoredLoopingYet())
doSomethingVeryExpensive();
flag_clear()
into
while(weArentBoredLoopingYet())
doSomethingVeryExpensive();
flag_set();
flag_clear()
Note: this kind of race is actually very popular. Your millage may vary. One one hand, the de-facto implementation of pthread_call_once involves a data race like this. On the other hand, it is undefined behavior. On most versions of gcc, you can get away with it because gcc chooses not to exercise its right to optimize this way in many cases, but it is not "spec" code.
B: full synchronization is the right call. This is simply what you have to do.
C: Only synchronization on the writer could work, if you can prove that no one wants to read it while it is writing. The official definition of a data race (from the C++11 specification) is one thread writing to a variable while another thread can concurrently read or write the same variable. If your readers and writers all run at once, you still have a race case. However, if you can prove that the writer writes once, there is some synchronization, and then the readers all read, then the readers do not need synchronization.
As for caching, the rule is that a mutex lock/unlock synchronizes with all threads that lock/unlock the same mutex. This means you will not see any unusual caching effects (although under the hood, your processor can do spectacular things to make this run faster... it's just obliged to make it look like it wasn't doing anything special). If you don't synchronize, however, you get no guarantees that the other thread doesn't have changes to push that you need!
All of that being said, the question is really how much are you willing to rely on compiler specific behavior. If you want to write proper code, you need to do proper synchronization. If you are willing to rely on the compiler to be kind to you, you can get away with a lot less.
If you have C++11, the easy answer is to use atomic_flag, which is designed to do exactly what you want AND is designed to synchronize correctly for you in most cases.
For the example you have posted, case A is sufficient provided that ...
Getting and setting the flag takes only one CPU instruction.
do_something_else() is not dependent upon the flag being set during the execution of that routine.
If getting and/or setting the flag takes more than one CPU instruction, then you must some form of locking.
If do_something_else() is dependent upon the flag being set during the execution of that routine, then you must lock as in case C but the mutex must be locked before calling flag_isset().
Hope this helps.
Assigning incoming job to worker threads requires no locking. Typical example is webserver, where the request is catched by a main thread, and this main thread selects a worker. I'm trying explain it with some pesudo code.
main task {
// do forever
while (true)
// wait for job
while (x != null) {
sleep(some);
x = grabTheJob();
}
// select worker
bool found = false;
for (n = 0; n < NUM_OF_WORKERS; n++)
if (workerList[n].getFlag() != AVAILABLE) continue;
workerList[n].setJob(x);
workerList[n].setFlag(DO_IT_PLS);
found = true;
}
if (!found) panic("no free worker task! ouch!");
} // while forever
} // main task
worker task {
while (true) {
while (getFlag() != DO_IT_PLS) sleep(some);
setFlag(BUSY_DOING_THE_TASK);
/// do it really
setFlag(AVAILABLE);
} // while forever
} // worker task
So, if there are one flag, which one party sets is to A and another to B and C (the main task sets it to DO_IT_PLS, and the worker sets it to BUSY and AVAILABLE), there is no confilct. Play it with "real-life" example, say, when the teacher is giving different tasks to students. The teacher selects a student, gives him/her a task. Then, the teacher looks for next available student. When a student is ready, he/she gets back to the pool of available students.
UPDATE: just clarify, there are only one main() thread and several - configurable number of - worker threads. As main() runs only one instance, there is no need to sync the selection and launc of the workers.
So, I am trying to implement a concurrent queue in C. I have split the methods into "read methods" and "write methods". So, when accessing the write methods, like push() and pop(), I acquire a writer lock. And the same for the read methods. Also, we can have several readers but only one writer.
In order to get this to work in code, I have a mutex lock for the entire queue. And two condition locks - one for the writer and the other for the reader. I also have two integers keeping track of the number of readers and writers currently using the queue.
So my main question is - how to implement several readers accessing the read methods at the same time?
At the moment this is my general read method code: (In psuedo code - not C. I am actually using pthreads).
mutex.lock();
while (nwriter > 0) {
wait(&reader);
mutex.unlock();
}
nreader++;
//Critical code
nreader--;
if (nreader == 0) {
signal(&writer)
}
mutex.unlock
So, imagine we have a reader which holds the mutex. Now any other reader which comes along, and tries to get the mutex, would not be able to. Wouldn't it block? Then how are many readers accessing the read methods at the same time?
Is my reasoning correct? If yes, how to solve the problem?
If this is not for an exercise, use read-write lock from pthreads (pthread_rwlock_* functions).
Also note that protecting individual calls with a lock stil might not provide necessary correctness guarantees. For example, a typical code for popping an element from STL queue is
if( !queue.empty() ) {
data = queue.top();
queue.pop();
}
And this will fail in concurrent code even if locks are used inside the queue methods, because conceptually this code must be an atomic transaction, but the implementation does not provide such guarantees. A thread may pop a different element than it read by top(), or attempt to pop from empty queue, etc.
Please find the following read\write functions.
In my functions, I used canRead and canWrite mutexes and nReads for number of readers:
Write function:
lock(canWrite) // Wait if mutex if not free
// Write
unlock(canWrite)
Read function:
lock(canRead) // This mutex protect the nReaders
nReaders++ // Init value should be 0 (no readers)
if (nReaders == 1) // No other readers
{
lock(canWrite) // No writers can enter critical section
}
unlock(canRead)
// Read
lock(canRead)
nReaders--;
if (nReaders == 0) // No more readers
{
unlock(canWrite) // Writer can enter critical secion
}
unlock(canRead)
A classic solution is multiple-readers, single-writer.
A data structure begins with no readers and no writers.
You permit any number of concurrent readers.
When a writer comes along, you block him till all current readers complete; then you let him go (any new readers and writers which come along which the writer is blocked queue up behind him, in order).
You may try this library it is built in c native, lock free, suitable for cross-platform lfqueue,
For Example:-
int* int_data;
lfqueue_t my_queue;
if (lfqueue_init(&my_queue) == -1)
return -1;
/** Wrap This scope in other threads **/
int_data = (int*) malloc(sizeof(int));
assert(int_data != NULL);
*int_data = i++;
/*Enqueue*/
while (lfqueue_enq(&my_queue, int_data) == -1) {
printf("ENQ Full ?\n");
}
/** Wrap This scope in other threads **/
/*Dequeue*/
while ( (int_data = lfqueue_deq(&my_queue)) == NULL) {
printf("DEQ EMPTY ..\n");
}
// printf("%d\n", *(int*) int_data );
free(int_data);
/** End **/
lfqueue_destroy(&my_queue);