Multiple-writer thread-safe queue in C - c

I am working on a multi-threaded C application using pthreads. I have one thread which writes to a a database (the database library is only safe to be used in a single thread), and several threads which are gathering data, processing it, and then need to send the results to the database thread for storage. I've seen in mentioned that it is "possible" to make a multiple-writer safe queue in C, but every place I see this mentioned simply says that it's "too complicated for this example" and merely demonstrates a single-writer safe queue.
I need the following things:
Efficient insertion and removal. I would assume that like any other queue O(1) enqueueing and dequeueing is possible.
Dynamically allocated memory, i.e. a linked structure. I need to not have an arbitrary limit on the size of the queue, so an array really isn't what I'm looking for.
EDIT: Reading threads should not spin on an empty queue, since there is likely to be minutes worth of time with no writes, with short bursts of large numbers of writes.

Sure, there are lockless queues. Based on what you've said in comments, though, performance here is not at all critical, since you're creating a thread per write anyway.
So, this is a standard use case for a condition variable. Make yourself a struct containing a mutex, a condition variable, a linked list (or circular buffer if you like), and a cancel flag:
write:
lock the mutex
(optionally - check the cancel flag to prevent leaks of stuff on the list)
add the event to the list
signal the condition variable
unlock the mutex
read:
lock the mutex
while (list is empty AND cancel is false):
wait on the condition variable with the mutex
if cancel is false: // or "if list non-empty", depending on cancel semantics
remove an event from the list
unlock the mutex
return event if we have one, else NULL meaning "cancelled"
cancel:
lock the mutex
set the cancel flag
(optionally - dispose of anything on the list, since the reader will quit)
signal the condition variable
unlock the mutex
If you're using a list with external nodes, then you might want to allocate the memory outside the mutex lock, just to reduce the time its held for. But if you design the events with an intrusive list node that's probably easiest.
Edit: you can also support multiple readers (with no portable guarantees for which one gets a given event) if in cancel you change the "signal" to "broadcast". Although you don't need it, it doesn't really cost anything either.

http://www.liblfds.org
Lock-free data structure library written in C.
Has the M&S queue.

If you dont need a lock free queue, then you could just wrap up an existing queue with a lock.
Mutex myQueueLock;
Queue myQueue;
void mtQueuePush(int value)
{
lock(myQueueLock);
queuePush(myQueue, value);
unlock(myQueueLock);
}
int mtQueueNext()
{
lock(myQueueLock);
int value = queueFront(myQueue);
queuePop(myQueue);
unlock(myQueueLock);
return value;
}
The only thing after that is to add some sort of handling for mtQueueNext when the queue is empty.
EDIT:
If you have a single reader, single writer lockless queue, you only need to have a lock around mtQueuePush, to prevent multiple simultaneous writers.
There are a number of single reader/writer lockless queues around, however most of them are implemented as c++ template classes. However do a google search and if need be work out how to rewrite them in plain C.

I'd go for multiple single-writer queues (one per writer thread). Then you can check this for how to get the single reader to read the various queues.

Related

How to use a semaphore in C to implement a mutex that respects starvation with a first in first out structure

I'm trying to implement a simple mutex lock using a semaphore that does not fall victim to starvation. In order to do this, I'm pretty sure I need to implement some sort of queue or other first-in-first-out approach, but semaphores in C appear to not respect any sort of FIFO structure. Given that, I've not been able to decipher how to wake and sleep threads in a proper FIFO order? Then again, perhaps I'm barking up the entirely wrong tree in my approach.
This link, https://pubs.opengroup.org/onlinepubs/7908799/xsh/sem_post.html implies something with SCHED_FIFO might be able to resolve my issue, but my relative inexperience with C left me neither sure if that could resolve my problem nor how I'd implement my solution.
Does C have a way of enabling a FIFO "fair" semaphore to make a fair lock that avoids starvation, or do you need a seperate queuing system of some sort? And in either case, how would you approach its implementation?
Thanks for any input you can provide!
Are you only allowed to use a single semaphore as the means to block a thread? If so, then I don't think there is any pretty solution. Here's an ugly one: (pseudo-code)
queue.put(my_thread_id);
semaphore.dec();
while (queue.head() != my_thread_id) {
semaphore.inc();
sleep(VERY_SMALL_TIME_INTERVAL)
semaphore.dec();
}
(void) queue.pop();
...do whatever...
semaphore.inc();
Suppose that thread A releases the semaphore while threads B, C, and D are awaiting it. Exactly one of B, C, or D will awaken. It will look at the queue, and if its own thread ID is at the head of the queue, it will pop the ID and proceed to do whatever. Otherwise, it will awaken one of the other two, sleep for a bit, and then try again.
In this way, each of the threads will be awakened, one-by-one, until one of them sees its own ID and breaks out of the loop.
The sleep(...) is important. Without it, the fundamental unfairness of the semaphore would make it likely that the subsequent semaphore.dec() call would immediately succeed, and the same thread would keep going round the loop, not seeing its own ID, and starving the others. The sleep(...) blocks the caller, thereby encouraging the OS to waken one of the other waiting threads.
OTOH, are you using the Posix Threads Library (pthreads)? And are you allowed to use any means available to block and awaken waiting threads? In that case, you could use a condition variable instead of the semaphore. You'd still need an explicit queue, and you'd still need a loop, but you could get rid of the sleep(...) because pthread_cond_broadcast(...) simultaneously awakens all of the waiting threads.
Condition variables are a bit trickier than semaphores—easy to make mistakes. I suggest you google for a good tutorial if you want to go that way.

Thread-safe ring buffer for producers-consumer in C

In C, I have several threads producing long values, and one thread consuming them. Therefore I need a buffer of a fixed size implemented in a similar fashion to i.e. the Wikipedia implementation, and methods that access it in a thread-safe manner.
On a general level, the following should hold:
When adding to a full buffer, the thread should be blocked (no overwriting old values).
The consumer thread should be blocked until the buffer is full - it's job has a high constant cost, and should do as much work as possible. (Does this call for a double-buffered solution?)
I would like to use a tried implementation, preferably from a library. Any ideas?
Motivation & explanation:
I am writing JNI code dealing with deleting global references kept as tags in heap objects.
When a ObjectFree JVMTI event occurs, I get a long tag representing a global reference I need to free using DeleteGlobalRef. For this, I need a JNIEnv reference - and getting it is really costly, so I want to buffer the requests and remove as many as possible at once.
There might be many threads receiving the ObjectFree event, and there will be one thread (mine) doing the reference deletion.
You can use a single buffer, with a mutex when accessed. You'll need to keep track of how many elements are used. For "signaling", you can use condition variables. One that is triggered by the producer threads whenever they place data in the queue; this releases the consumer thread to process the queue until empty. Another that is triggered by the consumer thread when it has emptied the queue; this signals any blocked producer threads to fill the queue. For the consumer, I recommend locking the queue and taking out as much as possible before releasing the lock (to avoid too many locks), especially since the dequeue operation is simple and fast.
Update
A few useful links:
* Wikipedia explanation
* POSIX Threads
* MSDN
Two possibilities:
a) malloc() a *Buffer struct with an array to hold some longs and an index - no locking required. Have each producer thread malloc its own *Buffer and start loading it up. When a producer thread fills the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue and immediately malloc() a new *Buffer. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers). This avoids any locks on the buffers themselves, leaving only the lock on the P-C queue. The snag is that producers that only occasionally generate their longs will not get their data processed until their *Buffer gets filled up, which may take some time, (you could push off the Buffer before the array gets full, in such a thread.
b) Declare a Buffer struct with an array to hold some longs and an index. Protect with a mutex/futex/CS lock. malloc() just one shared *Buffer and have all the threads get the lock, push on their long and release the lock. If a thread pushes in the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue, immediately malloc a new *Buffer and then release the lock. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers).
You may want to take condition in consideration. Take a look at this piece of code for consumer :
while( load == 0 )
pthread_cond_wait( &notEmpty, &mutex );
What it does is to check to see whether load ( where you store the number of elements in your list ) is zero or not and if it is zero, it'll wait until producer produces new item and put it in the list.
You should implement the same condition for producer ( when it wants to put item in a full list )

do condition variables have queues?

I am implementing a condition variable's wait operation. I have a struct for my condition variable. So far, my struct has a monitor, a queue, and a spinlock. But I am not sure if a condition variable should have a queue by itself. My notify looks like this:
void uthread_cv_notify (uthread_cv_t* cv) {
uthread_t* waiter_thread;
spinlock_lock(&cv->spinlock);
waiter_thread = dequeue (&cv->waiter_queue);
if(waiter_thread)
{
uthread_monitor_exit(cv->mon);
uthread_stop(TS_BLOCKED);
uthread_monitor_enter(cv->mon);
spinlock_unlock(&cv->spinlock);
}
}
But I wonder if in a notify function or a wait function I should just enqueue and dequeue in the monitor's waiting queue?
Thanks
The signal operation (that you're calling notify) should not require that the monitor be entered. This is inefficient.
It seems like you're trying to implement some clumsy old fashioned condition/monitor system in which the caller of "notify" must be inside the monitor, and it is guaranteed that if a thread is waiting, that thread gets the monitor before the "notify" caller returns to the monitor. (And that waiting thread does not have to have a loop re-testing the condition, either.)
That may be how C. A. R. Hoare initially described monitors and conditions, but the formalism is impractical/inefficient on modern multiprocessor systems, and also on threading implementations which do not have the luxury of being extremely tightly integrated with the low level scheduler (to be able to precisely control which thread gets to run when, so there are no races about who acquires a mutex first: for instance, to be able to transfer a thread from one wait queue to another, etc.)
Note how you're extending the critical section of the monitor over the spinlock_lock operation and over the dequeue operation. Neither of these belong under the monitor. The spinlock is independent, and the queue is guarded by the spinlock, not by the monitor. The monitor should protect the shared variables of the user code only (the special atomic property of of the wait operation).
So why do you need an extra queue? You are already storing all the threads that need to be notified.
Also, you probably want to do something like this:
void uthread_cv_notify (uthread_cv_t* cv) {
uthread_t* waiter_thread;
spinlock_lock(&cv->spinlock);
waiter_thread = dequeue (&cv->waiter_queue);
if(waiter_thread)
{
uthread_monitor_exit(cv->mon);
uthread_stop(TS_BLOCKED);
uthread_monitor_enter(cv->mon);
}
spinlock_unlock(&cv->spinlock);
}
This will ensure that the spin lock is always released.

Manipulating thread's nice value

I wrote a simple program that implements master/worker scheme where the master is the main thread, and workers are created by it.
The main thread writes something to a shared buffer, and the worker threads read this shared buffer, writing and reading to shared buffer are organized by read/write lock.
Unfortunately, this scheme definitely leads to starvation of main thread, since a single write has to wait on several reads to complete. One possible solution is increasing the priority of the master thread, so if it wants to write something, it will get immediate access to the shared buffer.
According to a great post to a similar issue, I discovered that probably manipulating the priority of a thread under SCHED_OTHER policy is not allowed, what can be changed is the nice value only.
I wrote a procedure to give worker threads lower priority than master thread, but it seems not to work correctly.
void assignWorkerThreadPriority(pthread_t* worker)
{
struct sched_param* worker_sched_param = (struct sched_param*)malloc(sizeof(struct sched_param));
worker_sched_param->sched_priority =0; //any value other than 0 gives error?
int policy = SCHED_OTHER;
pthread_setschedparam(*worker, policy, worker_sched_param);
printf("Result of changing priority is: %d - %s\n", errno, strerror(errno));
}
I have a two-fold question:
How can I set the nice value of a worker threads to avoid main thread starvation.
If not possible, then how can I change the scheduling policy to a one that allows changing the priority.
Edit: I managed to run the program using other policies, such as SCHED_FIFO, all I had to do was running the program as a super user
You cannot avoid problems using a read/write lock when the read and write usage is so even. You need a different method. You need a lock-free message queue or independent work queues or one of many other techniques.
Here is another way to do the job, the way I would do it. The worker can take the buffer away and work on it rather than keeping it shared:
Write thread:
Create work item.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Add work item to queue.
Release the lock.
Optionally signal a condition variable or Event. Another option is for worker threads to check for work on a timer.
Worker thread:
Create a new queue.
Wait for a condition variable or event or other signal, or wait on a timer.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Set the current queue pointer to the new queue.
Release the lock.
Proceed to work on the now private queue.
Delete the queue when all work items complete.
Now write thread creates more work items. When all the worker threads have their own copies of a queue to work on it will be able to write many items in peace.
You can modify this. For example, a worker thread may lock the queue and move a limited number of work items off into its own internal queue instead of taking the whole thing.

Thread-safety in C?

I want to write a high performance synchronized generator in C. I want to be able to feed events to it and have multiple threads be able to poll/read asynchronously, such that threads never receive duplicates.
I don't really know that much about how synchronization is typically done. Can someone give me a high level explanation of one or more techniques that I might be able to use?
Thanks!
You need a thread implementation; C does not have any built-in support for multiprocessing concepts. Threads are thus often implemented as libraries. Such a library will typically provide you with ways to synchronize the execution of multiple threads, ways to protect data, and so on.
The main concept in thread safety is the Mutex (though there is different kind of locks).
It is used to protect your memory from multiple accesses and race conditions.
A good example of its use would be when using a Linked List. You can't allow two different threads to modify it in the same time. In your example, you could possibly use a linked-list to create a queue, and each thread would consume some data from it.
Obviously there are other synchronization mechanisms, but this one is (by far ?) the most important.
You could have a look at this page (and referenced pages at the bottom) for more implementation details.
Thread-safe will be the problem when there are shared variables between threads. If you don't have any shared variables, it's not a problem. Every event can be readonly and disptaching to listeners randomly.
Thread safety is achieved by using whatever synchronisation primitives the multithreading implementation provides.
Your start point would probably be a linked list of events, a lock that protects it, and every thread takes the lock, consumes one event by adjusting the pointer to the first event and then releases the lock; appending events also locks the entire list. When the list is empty, the workers exit.
From there, various optimisations are possible:
Caching the pointer to the last event, so appending an event to the list becomes cheaper.
Adding a notification mechanism so worker threads can sleep while the list is empty. Typically, this is achieved with something called a condition variable.
Using multiple lists, so if the first list is locked, the worker can retrieve an event from another list without having to wait for the thread that has currently locked the list.

Resources