Thread-safe ring buffer for producers-consumer in C - c

In C, I have several threads producing long values, and one thread consuming them. Therefore I need a buffer of a fixed size implemented in a similar fashion to i.e. the Wikipedia implementation, and methods that access it in a thread-safe manner.
On a general level, the following should hold:
When adding to a full buffer, the thread should be blocked (no overwriting old values).
The consumer thread should be blocked until the buffer is full - it's job has a high constant cost, and should do as much work as possible. (Does this call for a double-buffered solution?)
I would like to use a tried implementation, preferably from a library. Any ideas?
Motivation & explanation:
I am writing JNI code dealing with deleting global references kept as tags in heap objects.
When a ObjectFree JVMTI event occurs, I get a long tag representing a global reference I need to free using DeleteGlobalRef. For this, I need a JNIEnv reference - and getting it is really costly, so I want to buffer the requests and remove as many as possible at once.
There might be many threads receiving the ObjectFree event, and there will be one thread (mine) doing the reference deletion.

You can use a single buffer, with a mutex when accessed. You'll need to keep track of how many elements are used. For "signaling", you can use condition variables. One that is triggered by the producer threads whenever they place data in the queue; this releases the consumer thread to process the queue until empty. Another that is triggered by the consumer thread when it has emptied the queue; this signals any blocked producer threads to fill the queue. For the consumer, I recommend locking the queue and taking out as much as possible before releasing the lock (to avoid too many locks), especially since the dequeue operation is simple and fast.
Update
A few useful links:
* Wikipedia explanation
* POSIX Threads
* MSDN

Two possibilities:
a) malloc() a *Buffer struct with an array to hold some longs and an index - no locking required. Have each producer thread malloc its own *Buffer and start loading it up. When a producer thread fills the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue and immediately malloc() a new *Buffer. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers). This avoids any locks on the buffers themselves, leaving only the lock on the P-C queue. The snag is that producers that only occasionally generate their longs will not get their data processed until their *Buffer gets filled up, which may take some time, (you could push off the Buffer before the array gets full, in such a thread.
b) Declare a Buffer struct with an array to hold some longs and an index. Protect with a mutex/futex/CS lock. malloc() just one shared *Buffer and have all the threads get the lock, push on their long and release the lock. If a thread pushes in the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue, immediately malloc a new *Buffer and then release the lock. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers).

You may want to take condition in consideration. Take a look at this piece of code for consumer :
while( load == 0 )
pthread_cond_wait( &notEmpty, &mutex );
What it does is to check to see whether load ( where you store the number of elements in your list ) is zero or not and if it is zero, it'll wait until producer produces new item and put it in the list.
You should implement the same condition for producer ( when it wants to put item in a full list )

Related

having hard time understanding one issue in consumer producer problem

I am having really hard time understanding one issue in consumer producer problem for example in the below image which is about the simple structure of consumer:
My big problem is that in wait(mutex) and signal(mutex) the parameter mutex is the same for both so it makes sense that signal(mutex) wake up wait(mutex) process if it is blocked but in wait(full) and signal(empty) they pass different parameters so how signal(empty) can wake up wait(full)??????(it is noteworthy that we assume both full and empty are of type semaphore)
here is some more information that may help:
also the code for producer is:
The mutex semaphore handles avoidance of mutual access to some shared resource, the full and empty semaphores handle when producer and when consumer is allowed to run. It all depends on the setup of the semaphores but basically full should be set up to block on the first wait of the consumer, empty should be available on first wait in consumer.
The producer will then handle data and post on the full semaphore, which in turn will unblock the consumer task. Consumer will block on the next empty wait until producer posts the empty semaphore and so on until infinity or program end.
Any producer/consumer solution uses a buffer. Practical buffer implementations need to deal with the buffer having a finite size. It thus needs to solve two synchronization problems. One is the obvious one, the consumer needs to be blocked when the buffer is empty and woken up again when an item enters the buffer. The less obvious one is that producer needs to be blocked when the buffer is filled to capacity, unblocked when a consumer removes an item.
Two very distinct blocking operations that affect different pieces of code. It thus requires two semaphores.
This concept is purely based on synchronization. Note two important things:
1. About full and empty:
Producer can not produce if the buffer is full and the consumer can not consume if the buffer is empty. So, semaphore full and empty are used only to check this requirement. Please, refer to your text, the initial value for empty is n(size of buffer) and initial value of full is 0(no item for consumer yet).
Step I. Producer has wait(empty) to check if the buffer has space(only then produce).
Step II. It has signal(full) to confirm that it has successfully produced one more item. Consumer can consume it now.
Step III. Consumer has wait(full) to check if it can consume something or not as whenever producer will produce an item, he will confirm(through Step II).
Step IV. Consumer has signal(empty) to confirm that it has consumed one time and so, the buffer space is free.(again Step I).
2.About mutex: The mutex variable is only to ensure that at one time,only one process accesses the buffer. That's why Producer and Consumer both have wait(mutex) and signal(mutex). Whenever any process(be it producer or consumer) accesses buffer, it acquires mutex and when it leaves buffer, it releases mutex.

c threads and resource locking

I have a 2 dimensional array and 8 concurrent threads writing to the array. If each thread reads/writes to a different array, will it result in a seg fault?
For example:
char **buffer;
//each thread has its own thread ID
void set(short ID, short elem, char var)
{
buffer[ID][elem] = var;
}
Would this be ok? I know this is pseudocode-ish, but you get the idea.
If each thread writes to a different sub-array, this aspect of your code will be fine and you will not need locking.
Multiple threads reading or writing to memory does not, by itself, lead to seg faults. What it can do is result in a race condition, where the results depend indeterminately on the ordering of operations of the multiple threads. The consequences depend on what you do with the memory you read ... if you read a value and then use it as an index or dereference a pointer, that might result in an out of bounds access even though the logic of the code if run by just one thread could not.
In your specific case, if each thread writes to non-overlapping memory because it uses a different ID, there's no possibility of a race condition when accessing the array. However, there could be a race condition when assigning the ID, resulting in two threads receiving the same ID ... so you need to use a lock or other way of guaranteeing that doesn't happen.
The main thing you will need to be careful of is how or when the 2D array is allocated. If all allocation occurs before the worker threads begin to access the array(s), and each worker thread reads and writes to only one of the "rows" of the master array for the lifetime of the thread, and it is the only thread to access that row, then you should not have any threading issues accessing or updating entries in the arrays.
If only one thread is writing to a row, but multiple threads could be reading from that same row, then you may need to work out some synchronization plan or else your readers may occasionally see inconsistent / incoherent data due to partial writes by a concurrent writer.
If each worker thread is hard-bound to a single "row" in the master array, it's also possible to allocate and reallocate the memory needed for each row by the worker thread itself, including updating the slot in the main array to point to the row data (re)allocated by the thread. There should be no contention for the pointer slot in the main array because only this worker thread is interested in that slot. Make sure the master array is allocated before any worker threads get started. For this scenario, also make sure that your C RTL malloc implementation is thread safe. (you may have to select a thread-safe RTL in your build options)

about synchronization with using multiple semaphores

hi there i'm working on an assignment about using POSIX threads with multi semaphores. the brief explanation of assignment is: there are 4 various data packets (char/video/audio/image), each of them carried by a different thread and also we have a shared buffer. maximum threads can work on system will be maintained by the user as an input. for example; if user enters 10 then maximum 10 thread could be created to transmit data packets over a buffer in a given time. now the confusing part for me is, this buffer can contains limited packets instantly. (for example it can contain maximum 10 char packets and 20 video packets etc.) so we have to have different semaphores for each data type. the issue i know how to control the buffer size with semaphore which is very simple, but cant set the correct idea of using semaphores of packets'. even i tried some different methods i always faced with deadlock errors. here is my pseudocode to understand more clearly of my program.
define struct packege
define semaphore list
main
initialize variables and semaphores
while threadCounter is less than MaxThreadNumber
switch(random)
case 0: create a character package
create a thread to insert the package in buffer
case 1: create a video package
create a thread to insert the package in buffer
case 2: create an image package
create a thread to insert the package in buffer
case 3: create an audio package
create a thread to insert the package in buffer
increment threadCounter by one
end of while
create only one thread which will make the dequeue operation
end of main
producer function
for i->0 to size_of_package
sem_wait(empty_buffer) // decrement empty_buffer semaphore by size of package
lock_mutex
insert item into queueu
decrement counter of the buffer by size of package
unlock_mutex
for i->0 to size_of_package
sem_post(full_buffer) // increment full_buffer semaphore by size of package
end of producer function
consumer function
while TRUE // Loops forever
lock_mutex
if queue is not empty
dequeue
increment counter of the buffer size of package
unlock_mutex
for i->0 to size_of_package // The reason why i making the sem_wait operation here is i cant make the dequeue in outer region of mutex.
sem_wait(full_buffer)
for i->0 to size_of_package
sem_post(empty_buffer)
end of consumer function
with this implementation programe works correctly. but i couldnt use semaphores properly which belongs to threads of packages. i can listen every recommandation and will be appreciated for every answer.
This is not how semaphores are used. The buffer's control variables/structures should count how many messages are contained in the buffer and of what types. The mutex protects the buffer and its control variables/structures against concurrent access by different threads. A semaphore, if used, just signals the state of the buffer to the consumer and has no connection to the sizes of the packets; it certainly doesn't get incremented by the size of the packet!
You would be better advised to use pthread condition variables instead of semaphores. These are used in connection with the pthread mutex to guarantee race-free signalling between threads. The producer loop does this:
locks the mutex,
modifies the buffer etc to add new packet(s),
signals the condition variable, and
unlocks the mutex.
The consumer loop does this:
locks the mutex,
processes all buffered data,
waits for the condition variable.
Read up on pthread_cond_init, pthread_cond_signal and pthread_cond_wait.
Since it's an assignment, you probably don't need to have real packets data read and write, but just simulate their handling.
In that case, the problem boils down to how to effectively block the producer threads when they reach the limit of packet they can write in the buffer. At the moment, you are using the semaphore to count the individual elements of a packet written in the buffer, as far as I understand.
Imagine that your writes in the buffer are atomic, and that you just want to count the packets, not the packet elements. Each time a producer writes a packet, it must signal it to the consumer, with the appropriate semaphore, and each time the consumer reads a packet, it must signal it to the appropriate producer.
Let me highlight a few other points:
The important property of a semaphore is that it will block when it reaches zero. For instance, if its initial value is 10, after 10 successive sem_get, the 11th will block.
You have 4 types of packets, each with a different threshold on the number that can be written in the buffer.
As I said, the producer must signal that it wrote a packet, but it must also be stopped once it reaches the threshold. To achieve that, you make it acquire the semaphore each time it posts a new packet, with sem_get. And you have the consumer do a sem_post each time it read a packet, the reverse of what you did with your single semaphore version. However, since you want the producer stop at the threshold, you initialize the semaphore with a capacity of N - 1, N being the threshold. Note that you have to signal that a new packet is available after you wrote it in the buffer, otherwise the consumer might block the buffer.
producer<type> function
write_packet() // put the packet in the buffer
sem_wait(type) // signal a new packet is available
// (if there's not enough space for another packet, the producer will block here)
end producer<type> function
consumer function
while TRUE // Loops forever
switch packet_available() // look if there's a new packet available
case video:
read_packet<video>()
sem_post(video)
(...)
default: // no packet available, just wait a little
sleep()
end if
end while
You still need to define the packet_read, packet_write, and packet_available functions, probably using a mutex to limit access to the buffer.

Manipulating thread's nice value

I wrote a simple program that implements master/worker scheme where the master is the main thread, and workers are created by it.
The main thread writes something to a shared buffer, and the worker threads read this shared buffer, writing and reading to shared buffer are organized by read/write lock.
Unfortunately, this scheme definitely leads to starvation of main thread, since a single write has to wait on several reads to complete. One possible solution is increasing the priority of the master thread, so if it wants to write something, it will get immediate access to the shared buffer.
According to a great post to a similar issue, I discovered that probably manipulating the priority of a thread under SCHED_OTHER policy is not allowed, what can be changed is the nice value only.
I wrote a procedure to give worker threads lower priority than master thread, but it seems not to work correctly.
void assignWorkerThreadPriority(pthread_t* worker)
{
struct sched_param* worker_sched_param = (struct sched_param*)malloc(sizeof(struct sched_param));
worker_sched_param->sched_priority =0; //any value other than 0 gives error?
int policy = SCHED_OTHER;
pthread_setschedparam(*worker, policy, worker_sched_param);
printf("Result of changing priority is: %d - %s\n", errno, strerror(errno));
}
I have a two-fold question:
How can I set the nice value of a worker threads to avoid main thread starvation.
If not possible, then how can I change the scheduling policy to a one that allows changing the priority.
Edit: I managed to run the program using other policies, such as SCHED_FIFO, all I had to do was running the program as a super user
You cannot avoid problems using a read/write lock when the read and write usage is so even. You need a different method. You need a lock-free message queue or independent work queues or one of many other techniques.
Here is another way to do the job, the way I would do it. The worker can take the buffer away and work on it rather than keeping it shared:
Write thread:
Create work item.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Add work item to queue.
Release the lock.
Optionally signal a condition variable or Event. Another option is for worker threads to check for work on a timer.
Worker thread:
Create a new queue.
Wait for a condition variable or event or other signal, or wait on a timer.
Lock the mutex or CriticalSection protecting the current queue and pointer to queue.
Set the current queue pointer to the new queue.
Release the lock.
Proceed to work on the now private queue.
Delete the queue when all work items complete.
Now write thread creates more work items. When all the worker threads have their own copies of a queue to work on it will be able to write many items in peace.
You can modify this. For example, a worker thread may lock the queue and move a limited number of work items off into its own internal queue instead of taking the whole thing.

Multiple-writer thread-safe queue in C

I am working on a multi-threaded C application using pthreads. I have one thread which writes to a a database (the database library is only safe to be used in a single thread), and several threads which are gathering data, processing it, and then need to send the results to the database thread for storage. I've seen in mentioned that it is "possible" to make a multiple-writer safe queue in C, but every place I see this mentioned simply says that it's "too complicated for this example" and merely demonstrates a single-writer safe queue.
I need the following things:
Efficient insertion and removal. I would assume that like any other queue O(1) enqueueing and dequeueing is possible.
Dynamically allocated memory, i.e. a linked structure. I need to not have an arbitrary limit on the size of the queue, so an array really isn't what I'm looking for.
EDIT: Reading threads should not spin on an empty queue, since there is likely to be minutes worth of time with no writes, with short bursts of large numbers of writes.
Sure, there are lockless queues. Based on what you've said in comments, though, performance here is not at all critical, since you're creating a thread per write anyway.
So, this is a standard use case for a condition variable. Make yourself a struct containing a mutex, a condition variable, a linked list (or circular buffer if you like), and a cancel flag:
write:
lock the mutex
(optionally - check the cancel flag to prevent leaks of stuff on the list)
add the event to the list
signal the condition variable
unlock the mutex
read:
lock the mutex
while (list is empty AND cancel is false):
wait on the condition variable with the mutex
if cancel is false: // or "if list non-empty", depending on cancel semantics
remove an event from the list
unlock the mutex
return event if we have one, else NULL meaning "cancelled"
cancel:
lock the mutex
set the cancel flag
(optionally - dispose of anything on the list, since the reader will quit)
signal the condition variable
unlock the mutex
If you're using a list with external nodes, then you might want to allocate the memory outside the mutex lock, just to reduce the time its held for. But if you design the events with an intrusive list node that's probably easiest.
Edit: you can also support multiple readers (with no portable guarantees for which one gets a given event) if in cancel you change the "signal" to "broadcast". Although you don't need it, it doesn't really cost anything either.
http://www.liblfds.org
Lock-free data structure library written in C.
Has the M&S queue.
If you dont need a lock free queue, then you could just wrap up an existing queue with a lock.
Mutex myQueueLock;
Queue myQueue;
void mtQueuePush(int value)
{
lock(myQueueLock);
queuePush(myQueue, value);
unlock(myQueueLock);
}
int mtQueueNext()
{
lock(myQueueLock);
int value = queueFront(myQueue);
queuePop(myQueue);
unlock(myQueueLock);
return value;
}
The only thing after that is to add some sort of handling for mtQueueNext when the queue is empty.
EDIT:
If you have a single reader, single writer lockless queue, you only need to have a lock around mtQueuePush, to prevent multiple simultaneous writers.
There are a number of single reader/writer lockless queues around, however most of them are implemented as c++ template classes. However do a google search and if need be work out how to rewrite them in plain C.
I'd go for multiple single-writer queues (one per writer thread). Then you can check this for how to get the single reader to read the various queues.

Resources