about synchronization with using multiple semaphores - c

hi there i'm working on an assignment about using POSIX threads with multi semaphores. the brief explanation of assignment is: there are 4 various data packets (char/video/audio/image), each of them carried by a different thread and also we have a shared buffer. maximum threads can work on system will be maintained by the user as an input. for example; if user enters 10 then maximum 10 thread could be created to transmit data packets over a buffer in a given time. now the confusing part for me is, this buffer can contains limited packets instantly. (for example it can contain maximum 10 char packets and 20 video packets etc.) so we have to have different semaphores for each data type. the issue i know how to control the buffer size with semaphore which is very simple, but cant set the correct idea of using semaphores of packets'. even i tried some different methods i always faced with deadlock errors. here is my pseudocode to understand more clearly of my program.
define struct packege
define semaphore list
main
initialize variables and semaphores
while threadCounter is less than MaxThreadNumber
switch(random)
case 0: create a character package
create a thread to insert the package in buffer
case 1: create a video package
create a thread to insert the package in buffer
case 2: create an image package
create a thread to insert the package in buffer
case 3: create an audio package
create a thread to insert the package in buffer
increment threadCounter by one
end of while
create only one thread which will make the dequeue operation
end of main
producer function
for i->0 to size_of_package
sem_wait(empty_buffer) // decrement empty_buffer semaphore by size of package
lock_mutex
insert item into queueu
decrement counter of the buffer by size of package
unlock_mutex
for i->0 to size_of_package
sem_post(full_buffer) // increment full_buffer semaphore by size of package
end of producer function
consumer function
while TRUE // Loops forever
lock_mutex
if queue is not empty
dequeue
increment counter of the buffer size of package
unlock_mutex
for i->0 to size_of_package // The reason why i making the sem_wait operation here is i cant make the dequeue in outer region of mutex.
sem_wait(full_buffer)
for i->0 to size_of_package
sem_post(empty_buffer)
end of consumer function
with this implementation programe works correctly. but i couldnt use semaphores properly which belongs to threads of packages. i can listen every recommandation and will be appreciated for every answer.

This is not how semaphores are used. The buffer's control variables/structures should count how many messages are contained in the buffer and of what types. The mutex protects the buffer and its control variables/structures against concurrent access by different threads. A semaphore, if used, just signals the state of the buffer to the consumer and has no connection to the sizes of the packets; it certainly doesn't get incremented by the size of the packet!
You would be better advised to use pthread condition variables instead of semaphores. These are used in connection with the pthread mutex to guarantee race-free signalling between threads. The producer loop does this:
locks the mutex,
modifies the buffer etc to add new packet(s),
signals the condition variable, and
unlocks the mutex.
The consumer loop does this:
locks the mutex,
processes all buffered data,
waits for the condition variable.
Read up on pthread_cond_init, pthread_cond_signal and pthread_cond_wait.

Since it's an assignment, you probably don't need to have real packets data read and write, but just simulate their handling.
In that case, the problem boils down to how to effectively block the producer threads when they reach the limit of packet they can write in the buffer. At the moment, you are using the semaphore to count the individual elements of a packet written in the buffer, as far as I understand.
Imagine that your writes in the buffer are atomic, and that you just want to count the packets, not the packet elements. Each time a producer writes a packet, it must signal it to the consumer, with the appropriate semaphore, and each time the consumer reads a packet, it must signal it to the appropriate producer.
Let me highlight a few other points:
The important property of a semaphore is that it will block when it reaches zero. For instance, if its initial value is 10, after 10 successive sem_get, the 11th will block.
You have 4 types of packets, each with a different threshold on the number that can be written in the buffer.
As I said, the producer must signal that it wrote a packet, but it must also be stopped once it reaches the threshold. To achieve that, you make it acquire the semaphore each time it posts a new packet, with sem_get. And you have the consumer do a sem_post each time it read a packet, the reverse of what you did with your single semaphore version. However, since you want the producer stop at the threshold, you initialize the semaphore with a capacity of N - 1, N being the threshold. Note that you have to signal that a new packet is available after you wrote it in the buffer, otherwise the consumer might block the buffer.
producer<type> function
write_packet() // put the packet in the buffer
sem_wait(type) // signal a new packet is available
// (if there's not enough space for another packet, the producer will block here)
end producer<type> function
consumer function
while TRUE // Loops forever
switch packet_available() // look if there's a new packet available
case video:
read_packet<video>()
sem_post(video)
(...)
default: // no packet available, just wait a little
sleep()
end if
end while
You still need to define the packet_read, packet_write, and packet_available functions, probably using a mutex to limit access to the buffer.

Related

does semaphore preserve wait order [duplicate]

I have created a program in C that creates 2 buffers. The buffer indices hold single characters, 'A' or 'b' etc... In order to learn more about multithreading, I created a set of semaphores based on the producer/consumer problem to produce characters and consume characters from the buffers. I have 3 producer threads for each buffer and 10 consumer threads. The consumers take one item from each buffer, then report it (freeing the memory of the consumed item also). Now, from what I've read, sem_wait() is supposed to signal the "longest waiting thread" when it comes out of a blocking state (I read this in a book and in an online POSIX library).
Now, is this actually true?
The application I have made should have both consumers and producers waiting at the same sem_wait() gate, but the producers get into the critical section more than double the time of any consumer. The consumers do have an extra semaphore to wait for, but that shouldn't make that huge of a difference. I can't seem to figure out why it's happening, so I'm hoping someone else does. If I sleep(1) on the producer threads, the consumers get in just fine and the buffers hover around 0 items...like I would think would happen otherwise.
Also, should thread creation order play any role in how I structure the program for fairness?
IE, produce one of each type in a round robin fashion until everyone is created and running.
Are there any methods anyone can describe to me to institute a more fair system of thread access? I've read that creating a FIFO queue system might be one solution, where the longest waiting thread has the highest priority (which is what I thought sem_wait() would do anyways).
Just wondering what methods are out there for both rudimentary and higher level threading.
The POSIX standard actually says that "the highest priority thread that has been waiting the longest shall be unblocked" only when the SCHED_FIFO or SCHED_RR scheduling policy applies to the blocked thread.
If you're not using one of those two realtime scheduling policies, then the semaphore does not have to be "fair".

How to get threads to execute in a certain order?

Almost every resource that I have looked up has talked about how to enforce mutual exclusion, or deal with the producer/consumer problem.
The problem is that I need to get certain threads to execute before other threads, but can't figure out how. I am trying to use semaphores, but don't really see how they can help in my case.
I have
a read thread,
N number of search threads, and
a write thread.
The read thread fills a buffer with data, then the search threads parse the data and outputs it to a different buffer, which the write thread then writes to a file.
Any idea as to how I would accomplish this?
I can post the code I have so far if anyone thinks that would help.
I think what you're looking for is a monitor
I would use a few condition variables.
You have read buffers. Probably two. If the search threads are slow you want read to wait rather than use all the memory on buffers.
So: A read_ready condition variable and a read_ready_mutex. Set to true when there is an open buffer.
Next: A search_ready condition variable and a search_ready_mutex. Set to true when there is a complete read buffer.
Next: A write_ready variable and a write_ready mutex. Set to true when there is work to do for the write thread.
Instead of true/false you could use integers of the number of buffers that are ready. As long as you verify the condition while the mutex is held and only modify the condition while the mutex is held, it will work.
[Too long for a comment]
Cutting this down to two assumptions:
Searchig cannot be done before reading has finished.
Writing cannot be done before searching has finished.
I conclude:
Do not use threads for reading and writing, but do it from the main thread.
Just do the the searching in parallel using threads.
Generally speaking, threads are used precisely when we don't care about the order of execution.
If you want to execute some statements S1, S2, ... , SN in that order, then you catenate them into a program that is run by a single thread: { S1; S2; ...; SN }.
The specific problem you describe can be solved with a synchronization primitive known as a barrier (implemented as the POSIX function pthread_barrier_wait).
A barrier is initialized with a number, the barrier count N. Threads which call the barrier wait operation are suspended, until N threads accumulate. Then they are are all released. One of the threads receives a return value which tells it that it is the "serial thread".
So for instance, suppose we have N threads doing this read, process-in-paralle, and write sequence. It goes like this (pseudocode):
i_am_serial = barrier.wait(); # at first, everyone waits at the barrier
if (i_am_serial) # serial thread does the reading, preparing the data
do_read_task();
barrier.wait(); # everyone rendezvous at the barrier again
do_paralallel_processing(); # everyone performs the processing on the data
i_am_serial = barrier.wait(); # rendezvous after parallel processing
if (i_am_serial)
do_write_report_task(); # serialized integration and reporting of results

epoll IO with worker threads in C

I am writing a small server that will receive data from multiple sources and process this data. The sources and data received is significant, but no more than epoll should be able to handle quite well. However, all received data must be parsed and run through a large number of tests which is time consuming and will block a single thread despite epoll multiplexing. Basically, the pattern should be something like follows: IO-loop receives data and bundles it into a job, sends to the first thread available in the pool, the bundle is processed by the job and the result is passed pack to the IO loop for writing to file.
I have decided to go for a single IO thread and N worker threads. The IO thread for accepting tcp connections and reading data is easy to implement using the example provided at:
http://linux.die.net/man/7/epoll
Thread are also usually easy enough to deal with, but I am struggling to combine the epoll IO loop with a threadpool in an elegant manner. I am unable to find any "best practice" for using epoll with a worker pool online either, but quite a few questions regarding the same topic.
I therefore have some question I hope someone can help me answering:
Could (and should) eventfd be used as a mechanism for 2-way synchronization between the IO thread and all the workers? For instance, is it a good idea for each worker thread to have its own epoll routine waiting on a shared eventfd (with a struct pointer, containing data/info about the job) i.e. using the eventfd as a job queue somehow? Also perhaps have another eventfd to pass results back into the IO thread from multiple worker threads?
After the IO thread is signaled about more data on a socket, should the actual recv take place on the IO thread, or should the worker recv the data on their own in order to not block the IO thread while parsing data frames etc.? In that case, how can I ensure safety, e.g. in case recv reads 1,5 frames of data in a worker thread and another worker thread receives the last 0,5 frame of data from the same connection?
If the worker thread pool is implemented through mutexes and such, will waiting for locks block the IO thread if N+1 threads are trying to use the same lock?
Are there any good practice patterns for how to build a worker thread pool around epoll with two way communication (i.e. both from IO to workers and back)?
EDIT: Can one possible solution be to update a ring buffer from the IO-loop, after update send the ring buffer index to the workers through a shared pipe for all workers (thus giving away control of that index to the first worker that reads the index off the pipe), let the worker own that index until end of processing and then send the index number back into the IO-thread through a pipe again, thus giving back control?
My application is Linux-only, so I can use Linux-only functionality in order to achieve this in the most elegant way possible. Cross platform support is not needed, but performance and thread safety is.
In my tests, one epoll instance per thread outperformed complicated threading models by far. If listener sockets are added to all epoll instances, the workers would simply accept(2) and the winner would be awarded the connection and process it for its lifetime.
Your workers could look something like this:
for (;;) {
nfds = epoll_wait(worker->efd, &evs, 1024, -1);
for (i = 0; i < nfds; i++)
((struct socket_context*)evs[i].data.ptr)->handler(
evs[i].data.ptr,
evs[i].events);
}
And every file descriptor added to an epoll instance could have a struct socket_context associated with it:
void listener_handler(struct socket_context* ctx, int ev)
{
struct socket_context* conn;
conn->fd = accept(ctx->fd, NULL, NULL);
conn->handler = conn_handler;
/* add to calling worker's epoll instance or implement some form
* of load balancing */
}
void conn_handler(struct socket_context* ctx, int ev)
{
/* read all available data and process. if incomplete, stash
* data in ctx and continue next time handler is called */
}
void dummy_handler(struct socket_context* ctx, int ev)
{
/* handle exit condition async by adding a pipe with its
* own handler */
}
I like this strategy because:
very simple design;
all threads are identical;
workers and connections are isolated--no stepping on each other's toes or calling read(2) in the wrong worker;
no locks are required (the kernel gets to worry about synchronization on accept(2));
somewhat naturally load balanced since no busy worker will actively contend on accept(2).
And some notes on epoll:
use edge-triggered mode, non-blocking sockets and always read until EAGAIN;
avoid dup(2) family of calls to spare yourself from some surprises (epoll registers file descriptors, but actually watches file descriptions);
you can epoll_ctl(2) other threads' epoll instances safely;
use a large struct epoll_event buffer for epoll_wait(2) to avoid starvation.
Some other notes:
use accept4(2) to save a system call;
use one thread per core (1 for each physical if CPU-bound, or 1 for each each logical if I/O-bound);
poll(2)/select(2) is likely faster if connection count is low.
I hope this helps.
When performing this model, because we only know the packet size once we have fully received the packet, unfortunately we cannot offload the receive itself to a worker thread. Instead the best we can still do is a thread to receive the data which will have to pass off pointers to fully received packets.
The data itself is probably best held in a circular buffer, however we will want a separate buffer for each input source (if we get a partial packet we can continue receiving from other sources without splitting up the data. The remaining question is how to inform the workers of when a new packet is ready, and to give them a pointer to the data in said packet. Because there is little data here, just some pointers the most elegant way of doing this would be with posix message queues. These provide the ability for multiple senders and multiple receivers to write and read messages, always ensuring every message is received and by precisely 1 thread.
You will want a struct resembling the one below for each data source, I shall go through the fields purposes now.
struct DataSource
{
int SourceFD;
char DataBuffer[MAX_PACKET_SIZE * (THREAD_COUNT + 1)];
char *LatestPacket;
char *CurrentLocation
int SizeLeft;
};
The SourceFD is obviously the file descriptor to the data stream in question, the DataBuffer is where Packets contents are held while being processed, it is a circular buffer. The LatestPacket pointer is used to temporarily hold a pointer to the most resent packet in case we receive a partial packet and move onto another source before passing the packet off. The CurrentLocation stores where the latest packet ends so that we know where to place the next one, or where to carry on in case of partial receive. The size left is the room left in the buffer, this will be used to tell if we can fit the packet or need to circle back around to the beginning.
The receiving function will thus effectively
Copy the contents of the packet into the buffer
Move CurrentLocation to point to the end of the packet
Update SizeLeft to account for the now decreased buffer
If we cannot fit the packet in the end of the buffer we cycle around
If there is no room there either we try again a bit later, going to another source meanwhile
If we had a partial receive store the LatestPacket pointer to point to the start of the packet and go to another stream until we get the rest
Send a message using a posix message queue to a worker thread so it can process the data, the message will contain a pointer to the DataSource structure so it can work on it, it also needs a pointer to the packet it is working on, and it's size, these can be calculated when we receive the packet
The worker thread will do its processing using the received pointers and then increase the SizeLeft so the receiver thread will know it can carry on filling the buffer. The atomic functions will be needed to work on the size value in the struct so we don't get race conditions with the size property (as it is possible it is written by a worker and the IO thread simultaneously, causing lost writes, see my comment below), they are listed here and are simple and extremely useful.
Now, I have given some general background but will address the points given specifically:
Using the EventFD as a synchronization mechanism is largely a bad idea, you will find yourself using a fair amount of unneeded CPU time and it is very hard to perform any synchronization. Particularly if you have multiple threads pick up the same file descriptor you could have major problems. This is in effect a nasty hack that will work sometimes but is no real substitute for proper synchronization.
It is also a bad idea to try and offload the receive as explained above, you can get around the issue with complex IPC but frankly it is unlikely receiving IO will take enough time to stall your application, your IO is also likely much slower than CPU so receiving with multiple threads will gain little. (this assumes you do not say, have several 10 gigabit network cards).
Using mutexes or locks is a silly idea here, it fits much better into lockless coding given the low amount of (simultaneously) shared data, you are really just handing off work and data. This will also boost performance of the receive thread and make your app far more scalable. Using the functions mentioned here http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html you can do this nice and easily. If you did do it this way, what you would need is a semaphore, this can be unlocked every time a packet is received and locked by each thread which starts a job to allow dynamically more threads in if more packets are ready, that would have far less overhead then a homebrew solution with mutexes.
There is not really much difference here to any thread pool, you spawn a lot of threads then have them all block in mq_receive on the data message queue to wait for messages. When they are done they send their result back to the main thread which adds the results message queue to its epoll list. It can then receive results this way, it is simple and very efficient for small data payloads like pointers. This will also use little CPU and not force the main thread to waste time managing workers.
Finally your edit is fairly sensible, except for the fact as I ave suggested, message queues are far better than pipes here as they very efficiently signal events , guarantee a full message read and provide automatic framing.
I hope this helps, however it is late so if I missed anything or you have questions feel free to comment for clarification or more explanation.
I post the same answer in other post: I want to wait on both a file descriptor and a mutex, what's the recommended way to do this?
==========================================================
This is a very common seen problem, especially when you are developing network server-side program. Most Linux server-side program's main look will loop like this:
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
resp = proc(req);
fd.send(resp);
}
}
It is single threaded(the main thread), epoll based server framework. The problem is, it is single threaded, not multi-threaded. It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases).
If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread).
We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough.
Then we need a way to obtain the task result from a worker thread. How? If we just check the message queue directly, before or after epoll_wait(), however, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it wait are not active.
For a server, 10 ms is quite a long time! Can we signal epoll_wait() to end immediately when task result is generated?
Yes! I will describe how it is done in one of my open source project.
Create a pipe for all worker threads, and epoll waits on that pipe as well. Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! - Linux pipe has 5 us to 20 us latency.
In my project SSDB(a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll.
SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94
Usage in main thread:
epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
ret = epoll_wait();
foreach(ret as fd){
if(fd is worker_thread){
sock, resp = worker->pop_result();
sock.send(resp);
}
if(fd is client_socket){
req = fd.read();
worker->add_task(fd, req);
}
}
}
Usage in worker thread:
fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);

having hard time understanding one issue in consumer producer problem

I am having really hard time understanding one issue in consumer producer problem for example in the below image which is about the simple structure of consumer:
My big problem is that in wait(mutex) and signal(mutex) the parameter mutex is the same for both so it makes sense that signal(mutex) wake up wait(mutex) process if it is blocked but in wait(full) and signal(empty) they pass different parameters so how signal(empty) can wake up wait(full)??????(it is noteworthy that we assume both full and empty are of type semaphore)
here is some more information that may help:
also the code for producer is:
The mutex semaphore handles avoidance of mutual access to some shared resource, the full and empty semaphores handle when producer and when consumer is allowed to run. It all depends on the setup of the semaphores but basically full should be set up to block on the first wait of the consumer, empty should be available on first wait in consumer.
The producer will then handle data and post on the full semaphore, which in turn will unblock the consumer task. Consumer will block on the next empty wait until producer posts the empty semaphore and so on until infinity or program end.
Any producer/consumer solution uses a buffer. Practical buffer implementations need to deal with the buffer having a finite size. It thus needs to solve two synchronization problems. One is the obvious one, the consumer needs to be blocked when the buffer is empty and woken up again when an item enters the buffer. The less obvious one is that producer needs to be blocked when the buffer is filled to capacity, unblocked when a consumer removes an item.
Two very distinct blocking operations that affect different pieces of code. It thus requires two semaphores.
This concept is purely based on synchronization. Note two important things:
1. About full and empty:
Producer can not produce if the buffer is full and the consumer can not consume if the buffer is empty. So, semaphore full and empty are used only to check this requirement. Please, refer to your text, the initial value for empty is n(size of buffer) and initial value of full is 0(no item for consumer yet).
Step I. Producer has wait(empty) to check if the buffer has space(only then produce).
Step II. It has signal(full) to confirm that it has successfully produced one more item. Consumer can consume it now.
Step III. Consumer has wait(full) to check if it can consume something or not as whenever producer will produce an item, he will confirm(through Step II).
Step IV. Consumer has signal(empty) to confirm that it has consumed one time and so, the buffer space is free.(again Step I).
2.About mutex: The mutex variable is only to ensure that at one time,only one process accesses the buffer. That's why Producer and Consumer both have wait(mutex) and signal(mutex). Whenever any process(be it producer or consumer) accesses buffer, it acquires mutex and when it leaves buffer, it releases mutex.

Thread-safe ring buffer for producers-consumer in C

In C, I have several threads producing long values, and one thread consuming them. Therefore I need a buffer of a fixed size implemented in a similar fashion to i.e. the Wikipedia implementation, and methods that access it in a thread-safe manner.
On a general level, the following should hold:
When adding to a full buffer, the thread should be blocked (no overwriting old values).
The consumer thread should be blocked until the buffer is full - it's job has a high constant cost, and should do as much work as possible. (Does this call for a double-buffered solution?)
I would like to use a tried implementation, preferably from a library. Any ideas?
Motivation & explanation:
I am writing JNI code dealing with deleting global references kept as tags in heap objects.
When a ObjectFree JVMTI event occurs, I get a long tag representing a global reference I need to free using DeleteGlobalRef. For this, I need a JNIEnv reference - and getting it is really costly, so I want to buffer the requests and remove as many as possible at once.
There might be many threads receiving the ObjectFree event, and there will be one thread (mine) doing the reference deletion.
You can use a single buffer, with a mutex when accessed. You'll need to keep track of how many elements are used. For "signaling", you can use condition variables. One that is triggered by the producer threads whenever they place data in the queue; this releases the consumer thread to process the queue until empty. Another that is triggered by the consumer thread when it has emptied the queue; this signals any blocked producer threads to fill the queue. For the consumer, I recommend locking the queue and taking out as much as possible before releasing the lock (to avoid too many locks), especially since the dequeue operation is simple and fast.
Update
A few useful links:
* Wikipedia explanation
* POSIX Threads
* MSDN
Two possibilities:
a) malloc() a *Buffer struct with an array to hold some longs and an index - no locking required. Have each producer thread malloc its own *Buffer and start loading it up. When a producer thread fills the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue and immediately malloc() a new *Buffer. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers). This avoids any locks on the buffers themselves, leaving only the lock on the P-C queue. The snag is that producers that only occasionally generate their longs will not get their data processed until their *Buffer gets filled up, which may take some time, (you could push off the Buffer before the array gets full, in such a thread.
b) Declare a Buffer struct with an array to hold some longs and an index. Protect with a mutex/futex/CS lock. malloc() just one shared *Buffer and have all the threads get the lock, push on their long and release the lock. If a thread pushes in the last array position, queue the *Buffer to the consumer thread on a producer-consumer queue, immediately malloc a new *Buffer and then release the lock. The consumer gets the *Buffers and processes them and then free()s them, (or queues them off somewhere else, or pushes them back onto a pool for re-use by producers).
You may want to take condition in consideration. Take a look at this piece of code for consumer :
while( load == 0 )
pthread_cond_wait( &notEmpty, &mutex );
What it does is to check to see whether load ( where you store the number of elements in your list ) is zero or not and if it is zero, it'll wait until producer produces new item and put it in the list.
You should implement the same condition for producer ( when it wants to put item in a full list )

Resources