I have the following problem to solve:
Consider an application where there are three types of threads: Calculus-A,Calculus-B and Finalization. Whenever a thread type Calculus-A ends, it calls the routine endA(), which returns immediately. Whenever a thread type Calculus-B ends, it calls the routine endB(), which returns immediately. Threads like Finalization routine call wait(),
which returns only if they have already completed two Calculation-A threads and 2 Calculation-B threads. In other words, for exactly 2 conclusions of Calculus-A and 2 conclusions of Calculus-B one thread Finalization is allowed to continue.
There is an undetermined number of threads of the 3 types. It is not known the order of the routines called by threads. Threads Completion are answered in the order of arrival.
Implement routines endA(), endB() and wait() using semaphores. Besides the variables initialization, the only possible operations are P and V. Solutions with busy-waiting are not acceptable.
Here's is my solution:
semaphore calcA = 2;
semaphore calcB = 2;
semaphore wait = -3;
void endA()
{
P(calcA);
V(wait);
}
void endB()
{
P(calcB);
V(wait);
}
void wait()
{
P(wait);
P(wait);
P(wait);
P(wait);
V(calcA);
V(calcA);
V(calcB);
V(calcB);
}
I believe that there will be a deadlock due to the wait's initialization and if and wait() executes before endA() and endB(). Is there any other solution for this?
I tend to view semaphore problems as problems where one must identify "sources of waiting" and define for each a semaphore and a protocol for their access.
With that in mind, the "sources of waiting" are
Completions of CalcA
Completions of CalcB
Maybe, if I understood this right, a wait on whole completion groups, consisting of two CalcAs and two CalcBs. I say maybe because I'm not sure what "Threads Completion are answered in the order of arrival." means.
Completions of CalcA and CalcB should therefore increment their respective counters. At the other end, one Finalization thread gains exclusive access to the counters and waits in any order for the needed number of completions to constitute a completion group. It then unlocks access to the next group.
My code is below, although since I'm unfamiliar with the Dutch V and P I will use take()/give().
semaphore calcA = 0;
semaphore calcB = 0;
semaphore groupSem = 1;
void endA(){
give(calcA);
}
void endB(){
give(calcB);
}
void wait(){
take(groupSem);
take(calcA);
take(calcA);
take(calcB);
take(calcB);
give(groupSem);
}
The groupSem semaphore ensures all-or-nothing: the thread that enters the critical section will get the next two completions of each of CalcA and CalcB. If groupSem wasn't there, the first thread to enter wait could take two As and block, then be taken over by another thread that grabs two As and two B and then run away.
A worse problem that exists if the groupSem isn't there is if this second thread takes two As, one B and then blocks, and then the first thread grabs the second B. If somehow the result of the finalization allows more runs of CalculationA and CalculationB, then you may have a deadlock, because there may be no more opportunity for instances of calculation A and B to complete, therefore leaving the finalization threads hanging, unable to produce more calculation instances.
Related
I'm reading this book here (official link, it's free) to understand threads and parallel programming.
Here's the question.
Why does the book say that pthread_cond_signal must be done with a lock held to prevent data race? I wasn't sure, so I referred to this question (and this question too), which basically said "no, it's not required". Why would a race condition occur?
What and where is the race condition being described?
The code and passage in question is as follows.
...
The code to wake a thread, which would run in some other thread, looks like this:
pthread_mutex_lock(&lock);
ready = 1;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&lock);
A few things to note about this code sequence. First, when signaling (as well as when modifying the global variable ready), we always make sure to have the lock held. This ensures that we don’t accidentally introduce a race condition into our code.
...
(please refer to the free, official pdf to get context.)
I couldn't comment with a small question in the link-2, so here is a full question.
Edit 1: I understand the lock is to control access to the ready variable. I am wondering why there's a race condition associated with the signaling. Specifically,
First, when signaling [...] we always make sure to have the lock held. This ensures that we don’t accidentally introduce a race condition into our code
Edit 2: I've seen resources and comments (from links commented below and during my own research), sometimes within the same page that say it doesn't matter or you must put it within a lock for Predictable BehaviorTM (would be nice if this can be touched upon too, if the behavior can be other than spurious wakeups). What must I follow?
Edit 3: I'm looking for more of a 'theoretical' answer, not implementation specific so that I can understand the core idea. I understand answers to these can be platform specific, but an answer that focuses on the core ideas of lock, mutex, condition variable as all implementations must follow these semantics, perhaps adding their own little quirks. Example, wait() can wake up spuriously, and given bad timing of signaling, can happen on 'pure' implementations too. Mentioning these would help.
My apologies for so many edits, but my dearth of in-depth knowledge in this field is confusing the heck outta me.
Any insight would be really helpful, thanks. Also, please feel free to point me to books where I can read these concepts in detail, and where I can learn C++ with these concepts too. Thanks.
Why does the book say that pthread_cond_signal must be done with a lock held to prevent data race? I wasn't sure, so I referred to this
question (and this question too), which basically said "no, it's not
required". Why would a race condition occur?
The book not presenting a complete example, my best guess as to the intended meaning is that there can be a data race with the CV itself if it is signaled without the associated mutex being held. That may be the case for some CV implementations, but the book is talking specifically about pthreads, and pthreads CVs are not subject to such a limitation. Neither is C++ std::condition_variable, which is what the two other SO questions you referred to are talking about. So in that sense, the book is just wrong.
It is true that one can compose examples of poor CV use, in conjunction with which signaling under protection of the associated mutex largely protects against data races, but signaling without such protection is susceptible to data races. But In such a case, the fault is not with the signaling itself, but with the waiting, and if that's what the book means then it is deceptively worded. And probably still wrong.
What and where is the race condition being described?
One can only guess what the author had in mind.
For the record, the proper usage of condition variables involves firstly determining what condition one wants to ensure holds before execution proceeds. That condition will necessarily involve shared variables, else there is no reason to expect that anything another thread does could change whether the condition is satisfied. That being the case, all access to the shared variables involved needs to be protected by a mutex if more than one thread is alive.
That mutex should then, secondly, also be the one associated with the CV, and threads must wait on the CV only while the mutex is held. This is a requirement of every CV implementation I know, and it protects against signals being missed and possible deadlock resulting from that. Consider this faulty, and somewhat contrived, example:
// BAD
int temp;
result = pthread_mutex_lock(m);
// handle failure results ...
temp = shared;
result = pthread_mutex_unlock(m);
// handle failure results ...
if (temp == 0) {
result = pthread_cond_wait(cv, m);
// handle failure results ...
}
// do something ...
Suppose that it was allowed to wait on the CV without holding the mutex, as that code does. That code supposes that at some point in the future, some other thread (T2) will update shared (under protection of the mutex) and then signal the CV to tell the waiting one (T1) that it can proceed. But what if T2 does that between when T1 unlocks the mutex and when it begins its wait? It doesn't matter whether T2 signals the CV under protection of the mutex or not -- T1 will begin a wait for a signal that has already been delivered. And CV signals do not queue.
So suppose that T1 only waits under protection of the mutex, as is in fact required. That's not enough. Consider this:
// ALSO BAD
result = pthread_mutex_lock(m);
// handle failure results ...
if (shared == 0) {
result = pthread_cond_wait(cv, m);
// handle failure results ...
}
result = pthread_mutex_unlock(m);
// handle failure results ...
// do something ...
This is still wrong, because it does not reliably prevent T1 from proceeding past the wait when the condition of interest is unsatisfied. Such a scenario can arise from
the signal being legitimately sent and received even though the particular condition of interest to T1 is not satisfied
the signal being legitimately sent and received, and the condition being satisfied when the signal is sent, but T2 or another thread modifying the shared variable again before T1 returns from its wait.
spurious return from the wait, which is very rare, but does occasionally happen in many real-world implementations.
None of that depends on T2 sending the signal without mutex protection.
The correct way to wait on a condition variable is to check the condition of interest before waiting, and afterward to loop back and check again before proceeding:
// OK
result = pthread_mutex_lock(m);
// handle failure results ...
while (shared == 0) { // <-- 'while', not 'if'
result = pthread_cond_wait(cv, m);
// handle failure results ...
}
// typically, shared = 0 at this point
result = pthread_mutex_unlock(m);
// handle failure results ...
// do something ...
It may sometimes be the case that thread T1 executing that code will return from its wait when the condition is not satisfied, but if ever it does then it will simply return to waiting instead of proceeding when it shouldn't. If other threads signal only under protection of the mutex then that should be rare, but still possible. If other threads signal without mutex protection then T1 may wake more often than strictly needed, but there is no data race involved, and no inherent risk of misbehavior.
Why does the book say that pthread_cond_signal must be done with a lock held to prevent data race? I wasn't sure, so I referred to this question (and this question too), which basically said "no, it's not required". Why would a race condition occur?
Yes, condition variable notification should generally be performed with the corresponding mutex locked. The reason is not so much to avoid a race condition but to avoid a missed or superfluous notification.
Consider the following piece of code:
std::queue< int > events;
std::mutex mutex;
std::condition_variable cond;
// Thread 1
void consume_events()
{
std::unique_lock< std::mutex > lock(mutex); // #1
while (true)
{
if (events.empty()) // #2
{
cond.wait(lock); // #3
continue;
}
// Process an event
events.pop();
}
}
// Thread 2
void produce_event(int event)
{
{
std::unique_lock< std::mutex > lock(mutex); // #4
events.push(event); // #5
} // #6
cond.notify_one(); // #7
}
This is a classical example of one producer/one consumer queue of data.
In the line #1 the consumer (Thread 1) locks the mutex. Then, in line #2, it tests if there are any events in the queue and, if there are none, in line #3 unlocks mutex and blocks. When the notification on the condition variable happens, the thread unblocks, immediately locks mutex and continues execution past line #3 (which is to go to line #2 again).
In the line #4 the producer (Thread 2) locks the mutex and in line #5 it enqueues a new event. Because the mutex is locked, event queue modification is safe (line #5 cannot be executed concurrently with line #2), so there is no data race. Then, in line #6, the mutex is unlocked and in line #7 the condition variable is notified.
It is possible that the following happens:
Thread 2 acquires the mutex in line #4.
Thread 1 attempts to acquire the mutex in line #1 or #3 (upon being unblocked by a previous notification). Since the mutex is locked by Thread 2, Thread 1 blocks.
Thread 2 enqueues the event in line #5 and unlocks the mutex in line #6.
Thread 1 unblocks and acquires the mutex. In line #2 it sees that the event queue is not empty and processes the event. On the next loop iteration the queue is empty and the thread blocks in line #3.
Thread 2 notifies Thread 1 in line #7. But there are no queued events, and Thread 1 wakes up in vain.
Though in this particular example, the extra wake up is benign, depending on the loop contents, it may be detrimental. The correct code should call notify_one before unlocking the mutex.
Another example is when one thread is used to initiate some work in the other thread without an explicit queue of events:
std::mutex mutex;
std::condition_variable cond;
// Thread 1
void process_work()
{
std::unique_lock< std::mutex > lock(mutex); // #1
while (true)
{
cond.wait(lock); // #2
// Do some processing // #3
}
}
// Thread 2
void initiate_work_processing()
{
cond.notify_one(); // #4
}
In this case Thread 1 waits until it is time to perform some activity (e.g. render a frame in a video game). Thread 2 periodically initiates that activity by notifying Thread 1 via condition variable.
The problem is that the condition variable does not buffer notifications and acts only on the threads that are actually blocked on it at the point of notification. If there are no threads blocked then the notification does nothing. This means that the following sequence of events is possible:
Thread 1 acquires the mutex in line #1 and blocks in line #2.
Thread 2 decides it is time to perform the periodic activity and notifies Thread 1 in line #4.
Thread 1 unblocks and goes to perform the activities (e.g. render a frame).
It turns out that this frame is a lot of work, and when Thread 2 comes to notify Thread 1 about the next frame in line #2, Thread 1 is still busy with the previous one. This notification gets missed.
Thread 1 is finally done with the frame and blocks in line #2. The user observes a frame dropped.
The above wouldn't have happened if Thread 2 locked mutex before notifying Thread 1 in line #4. If Thread 1 is still busy rendering a frame, Thread 2 would block until Thread 1 is done and only then issue the notification.
However, the correct solution for the above task is to introduce a flag or some other data protected by the mutex that Thread 2 can use to signal Thread 1 that it is time to perform its activities. Aside from fixing the missed notification problem, this also takes care of spurious wakeups.
What and where is the race condition being described?
Definition of a data race depends on the memory model used in the particular environment. This means primarily your programming language memory model and may include the underlying hardware memory model (if the programming language relies on the hardware memory model, which is the case with e.g. Assembler).
C++ defines data races as follows:
When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless
both evaluations execute on the same thread or in the same signal handler, or
both conflicting evaluations are atomic operations (see std::atomic), or
one of the conflicting evaluations happens-before another (see std::memory_order)
If a data race occurs, the behavior of the program is undefined.
So basically, when multiple threads access the same memory location concurrently (by means other than std::atomic) and at least one of the threads is modifying the data at that location, that is a data race.
For example, we have 5 pieces of data.(assume we have a lot of space, different version of data will not overlap each others.)
DATA0, DATA1, DATA2, DATA3, DATA4.
We have 3 threads(less than 5) working on those data.
Thread 1, working on DATA1 (version 0), has accessed some data from both DATA0(version 0) and DATA2(version 0), and create DATA1(version 1).
Thread 2, working on DATA3 (version 0), has accessed some data from both DATA2(version 0) and DATA4(version 0), and create DATA3(version 1).
Thread 3, working on DATA2 (version 0), has accessed some data from both DATA1(version 0) and DATA3(version 0), and create DATA2(version 1).
Now, if thread 1 finishes first. It has several choices, it can work on DATA0 (to create DATA0 version 1) since DATA1(version 0) and DATA4 (version 0) is available (Assume DATA0 & DATA4 are neighbors). It can also work on DATA 2 if it finds out that both DATA1(version1) and DATA3(version1) are available and create DATA2(version 2).
The requirement is the next version of data can be processed once it's neighbor data is ready(in 1 lower version).
At last, I want all threads to exit when all data arrive at version 10.
Question: How to implement this scheme using pthread library.
Note: I want to have data in different versions at the same time, so to create a barrier and make sure all data reach the same version is not an option.
Lets discuss the implementation. To have all versions (0~10) stored we would need 5*11*sizeof(data) space. Let us create two arrays of size 5 x 11. First array is DATA such that DATA[i][j] is the j th version of data i. Second array is an 'Access Matrix' - A, it denotes the state of an index, it could be:
Not started
In Progress
Completed
Algorithm: Each thread would search for an index [i][j] in the matrix such that, index [i-1][j-1] and [i+1][j-1] is 'Completed'. It would set A[i][j] to 'In Progress' while working on it. In case i=0, i-1 refers to n-1, if i=n-1, i+1 refers to 0. (like a circular queue). When all entries in the last column are 'Completed', the thread terminates. Otherwise it searches for a new data which is not completed.
Using pthread library to realize this:
Important variables: mutex, conditional variables.
pthread_mutex_t mutex= PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condvar= PTHREAD_COND_INITIALIZER;
mutex is a 'lock'. We use it when we need to make an operation atomic. Atomic operation refers to an operation that needs to be done in 1 step without breaking execution. 'condvar' is a condition variable. Using it a thread can sleep until a condition is reached, when it is reached, the thread is woken up. This avoids busy waiting using a loop.
Here, our atomic operation is updating A. Reason: If the threads simultaneously update A, it may lead to race conditions such as more than 1 thread working on a Data in parallel.
To realize this, we search and set A inside the lock. Once A is set, we release the lock and work on the data. But if no available data was found which could be worked on, we wait on the conditional variable - condvar. When we call wait on condvar, we also pass mutex. While inside the lock, wait function releases the mutex lock and waits for the conditional variable to be signaled. Once it is signaled, it requires the lock and proceeds with execution. While waiting process is in sleeping state and hence does not waste CPU time.
Whenever any thread finishes working on a piece of data, it may prepare 1 or more other samples for being worked on. Hence after a thread finishes work, it signals all other threads to check for a 'workable' Data before continuing the algorithm. Pseudo code for this is as follows:
Read the comments and function names. They describe in detail the working of pthread library. While compilation with gcc add -lpthread flag and for further details of the library looking up the man pages of these functions is more than sufficient.
void thread(void)
{
//Note: If there are various threads in the line pthread_mutex_lock(&mutex)
// each will wait till the lock is released and acquired. Until then it will sleep.
pthread_mutex_lock(&mutex); //Do the searching inside the lock
while(lastColumnNotDone){ //This will avoid previously searched indices being updated
//Search for a workable index
if(found)
{ //As A has been updated and set to in progress, no need to hold lock. As we start work on the thread we release the lock so other process might use it.
pthread_mutex_unlock(&mutex); //Note:
//WORK ON DATA
pthread_mutex_lock(&mutex); //Restore lock to in order to continue thread's execution safely.
pthread_cond_broadcast(&condvar); //Sends a wake up signal to all threads which are waiting for the conditional variable 'condvar'.
}
else //No executable data found
pthread_cond_wait(&condvar,&mutex); //While waiting, we pass the address of mutex as second parameter to wait function. This releases the lock on mutex while this function is waiting and tries to reacquire it once condvar is signaled.
}
pthread_mutex_unlock(&mutex);
}
Search and checking if all data is completed in the while loop condition can be optimized but that is a different algorithms question. Key idea here is use of pthread library and thread concept.
A is a common access matrix. Do NOT update it outside of lock.
While checking anything with respect to A, such as finding a process or checking if all data is done, lock must be held. Otherwise A can be changed by a different thread at the same time a thread is reading it.
We acquire and release locks using the functions pthread_mutex_lock and pthread_mutex_unlock. Remember, these functions take pointers of the mutex and not it's value. It is a variable that needs to be accessed and updated.
Avoid holding the lock for long amounts of time. This will cause the threads to wait for a long time for small access needs.
When calling wait, be sure that lock is held. Wait unlocks the mutex held passed as the second parameter during the duration of it's wait. After receiving the signal to wake up it tries to acquire the lock once again.
I wanna implement divide and conquer using pthread, but I don't know what will happen if I create more threads in a thread.
From my understanding, if the machine has a 2-core processor, it can only process 2 threads at the same time. If there are more than 2 threads, other threads have to wait for the resources, so if I create more and more threads while I'm going deeper, actually it may not increase the speed of the algorithm since only 2 threads can be processed at the same time.
I do some research online and it seems the threads at upper level can be inactive, only the ones at the deepest level stay active. How to achieve this? Also if an upper thread stays inactive, does it affect the lower thread?
There are two basic types: detached and joinable.
A joinable thread is one which you may wait for (or access the result of) termination using pthread_join.
Using more threads than there are cores can help or hurt -- depends on your program! It's often good to minimize or eliminate competition for resources with multithreading. Throwing too many threads at a program can actually slow the process down. However, you would likely have idle CPU time if the number of cores matches the thread count and one of the threads is waiting on disk IO (provided nothing significant is happening in other processes).
threads at upper level can be inactive, only the ones at the deepest level stay active. how to achieve this?
Using joinable threads, you can accomplish the nested thread approach you have outlined, and this is demonstrated in several tutorials. The basic flow is that a thread will create one or more workers, and wait for them to exit using pthread_join. However, alternatives such as tasks and thread pools are preferable in the majority of cases.
Nevertheless, it's unlikely that this approach is the best for execution, because it does not correlate (well) with hardware and scheduling operations, particularly as depth and width of your program grows.
if a upper thread stay inactive, won't affect the lower thread?
Yes. The typical problem, however, is that the work/threads are not constrained. Using the approach you have outlined, it's easy to spawn many threads and have an illogically high number of threads for the work which must be executed on a limited number of cores. Consequently, your program would waste a bunch of time context switching and waiting for threads to complete. Creating many threads can also waste/reserve a significant amount of resources, especially if they are short-lived and/or idle/waiting.
so if i create more and more threads while im going deeper, actually it may not increase the speed of the algorithm since only 2 threads can be processed at the same time.
Which suggests creating threads using this approach is flawed. You may want to create a few threads instead, and use a task based approach -- where each thread requests and executes tasks from a collection. Creating a thread takes a good bit of time and resources.
If you are trying to do a two-way divide and conquor, spawning two children and waiting for them to finish, you probably need something like:
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
pthread_create (left_child, NULL, routine, left_arg);
pthread_create (right_child, NULL, routine, right_arg);
/* wait for 'children' */
pthread_join (left_child, &left_return_val);
pthread_join (right_child, &right_return_val);
/* merge results & return */
}
A slight improvement would be this, where instead of sleeping, the 'parent thread' does the job of the right child synchronously, and spawns one less thread:
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
pthread_create (left_child, NULL, routine, left_arg);
/* do the right_child's work yourself */
right_return_val = routine (right_arg);
/* wait for 'left child' */
pthread_join (left_child, &left_return_val);
/* merge results & return */
}
However, when you go N levels deep, you have quite a few children. The speedup obtained really depends on how much time the CPU spends on real processing, and how much time it waits for I/O etc. If you know that on a machine with P cores, you can only get good speedup with, say kP threads, then instead of spawning threads as above, you could set up a 'worker pool' of kP threads, and keep reusing them. This way, once kP threads have been spawned, you won't spawn more:
THREAD_POOL pool = new_thread_pool (k * P); /* I made this function up */
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
left_thread = get_worker (pool); /* Blocks until a thread is free */
/* get left_thread to do processing for you */
right_thread = get_worker (pool); /* Blocks until a thread is free */
/* get right_thread to do processing for you */
/* wait for 'children' */
pthread_join (left_child, &left_return_val);
pthread_join (right_child, &right_return_val);
/* return the workers */
put_worker (pool, left_thread);
put_worker (pool, right_thread);
/* merge results & return */
}
You should be able to create many more threads than you have cores in your system. The operating system will make sure that every thread gets part of the CPU to do its work.
However, there is [probably] an upper limit to the number of threads you can create (check your OS documentation).
So if you create 5 threads in a system with 2 cores, then every thread will get about 40% of the cpu (on average). It's not that a thread has to wait until another thread has completely finished. Unless you use locks of course.
When you use locks to protect data from being changed or accessed by multiple threads, a number of problems can popup. Typical problems are:
dead locks: thread 1 waits on something that is locked by thread 2; thread 2 waits on something that is locked by thread 1
lock convoy: multiple threads are all waiting on the same lock
priority inversion: thread 1 has priority on thread 2, but since thread 2 has a lock most of the time, thread 1 still has to wait on thread 2
I found this page (http://ashishkhandelwal.arkutil.com/index.php/csharp-c/issues-with-multithreaded-programming-part-1/), which could be a good start on multithreaded programming.
I have three threads, one thread is the main and the other two are worker threads. The first thread, when there is work to be done wakes up one of the two threads. Each thread when awakened perform some computation and while doing this if it finds more work to do can wake up the other working thread or simply decide to do the job by itself (By adding work to a local queue, for example).
While the worker threads have work to do, the main thread must wait for the work to be done. I have implemented this with condition variables as follows (the code reported here hides a lot of details, please ask if there's something non understandable):
MAIN THREAD (pseudocode):
//this function can be called from the main several time. It blocks the main thread till the work is done.
void new_work(){
//signaling to worker threads if work is available
//Now, the threads have been awakened, it's time to sleep till they have finished.
pthread_mutex_lock(&main_lock);
while (work > 0) //work is a shared atomic integer, incremented each time there's work to do and decremented when finished executing some work unit
pthread_cond_wait(&main_cond);
pthread_mutex_unlock(&main_lock);
}
WORKER THREADS:
while (1){
pthread_mutex_lock(&main_lock);
if (work == 0)
pthread_cond_signal(&main_cond);
pthread_mutex_unlock(&main_lock);
//code to let the worker thread wait again -- PROBLEM!
while (I have work to do, in my queue){
do_work()
}
}
Here is the problem: when a worker thread wakes up the main thread I'm not sure that the worker thread calls a wait to put itself in a waiting state for new work. Even if I implement this wait with another condition variable, it can happen that the main thread is awake, does some work until reaches a point in which he has to wake up the thread that has not called a wait yet... and this can lead to bad results. I've tried several ways to solve this issue but I couldn't find a solution, maybe there is an obvious way to solve it but I'm missing it.
Can you provide a scheme to solve this kind of problem? I'm using the C language and I can use whatever synchronization mechanism you think can be suited, like pthreads or posix semaphores.
Thanks
The usual way to handle this is to have a single work queue and protect it from overflow and underflow. Something like this (where I have left off the "pthread_" prefixes):
mutex queue_mutex;
cond_t queue_not_full, queue_not_empty;
void enqueue_work(Work w) {
mutex_lock(&queue_mutex);
while (queue_full())
cond_wait(&queue_not_full, &queue_mutex);
add_work_to_queue(w);
cond_signal(&queue_not_empty);
mutex_unlock(&queue_mutex);
}
Work dequeue_work() {
mutex_lock(&queue_mutex);
while (queue_empty())
cond_wait(&queue_not_empty, &queue_mutex);
Work w = remove_work_from_queue();
cond_signal(&queue_not_full);
mutex_unlock(&queue_mutex);
}
Note the symmetry between these functions: enqueue <-> dequeue, empty <-> full, not_empty <-> not full.
This provides a thread-safe bounded-size queue for any number of threads producing work and any number of threads consuming work. (Actually, it is sort of the canonical example for the use of condition variables.) If your solution does not look exactly like this, it should probably be pretty close...
If you want the main thread to distribute work to the other two, then wait until both threads have completed their work before moving on, you might be able to accomplish this with a barrier.
A barrier is a synchronization construct that you can use to make threads wait at a certain point in your code until a set number of threads are all ready to move on. Essentially, you initialize a pthread barrier, saying that x number of threads must wait on it before any are allowed to continue. As each thread finishes its work and is ready to go on, it will wait on the barrier, and once x number of threads have reached the barrier, they are all allowed to continue.
In your case, you might be able to do something like:
pthread_barrier_t barrier;
pthread_barrier_init(&barrier, 3);
master()
{
while (work_to_do) {
put_work_on_worker_queues();
pthread_barrier_wait(&barrier);
}
}
worker()
{
while(1) {
while (work_on_my_queue()) {
do_work();
}
pthread_barrier_wait(&barrier);
}
}
This should make your main thread give out work, then wait both worker threads to complete the work they were given (if any) before moving on.
Could you have "new job" queue, which is managed by the main thread? The main thread could dish out 1 job at a time to each worker thread. The main thread would also listen for completed jobs by the workers. If a worker thread finds a new job that needs doing just add it to the "new job" queue and the main thread will distribute it.
Pseudocode:
JobQueue NewJobs;
Job JobForWorker[NUM_WORKERS];
workerthread()
{
while(wait for new job)
{
do job (this may include adding new jobs to NewJobs queue)
signal job complete to main thread
}
}
main thread()
{
while(whatever)
{
wait for job completion on any worker thread
now a worker thread is free put a new job on it
}
}
I believe that what you have here is a variation on the producer-consumer problem. What you are doing is writing up an ad-hoc implementation of a counting semaphore (one that is used to provide more than just mutual exclusion).
If I've read your question right, what you are trying to do is have the worker threads block until there is a unit of work available and then perform a unit of work once it becomes available. Your issue is with the case where there is too much work available and the main thread tries to unblock a worker that is already working. I would structure your code as follows.
sem_t main_sem;
sem_init(&main_sem, 0, 0);
void new_work() {
sem_post(&main_sem);
pthread_cond_wait(&main_cond);
}
void do_work() {
while (1) {
sem_wait(&main_sem);
// do stuff
// do more stuff
pthread_cond_signal(&main_sem);
}
}
Now, if the worker threads generate more work then they can simply sem_post to the semaphore and simply defer the pthread_cond_signal till all the work is done.
Note however, if you actually need the main thread to always block when the worker is working, it's not useful to push the work to another thread when you could just call a function that does the work.
I'm creating n threads & then starting then execution after a barrier breakdown.
In global data space:
int bkdown = 0;
In main():
pthread_barrier_init(&bar,NULL,n);
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
In thread runner function:
void *runner(void *param)
{
pthread_barrier_wait(&bar);
if(bkdown==0){bkdown=1;printf("barrier broken down!\n");}
...
pthread_exit(NULL);
}
Expected order:
breakdown imminent!
barrier broken down!
breakdown already occurred!
Actual order: (tested repeatedly)
breakdown imminent!
breakdown already occurred!
barrier broken down!!
Could someone explain why the I am not getting the "broken down" message before the "already occurred" message?
The order in which threads are run is dependent on the operating system. Just because you start a thread doesn't mean the OS is going to run it immediately.
If you really want to control the order in which threads are executed, you have to put some kind of synchronization in there (with mutexes or condition variables.)
for(i=0;i<n;i++)
{
pthread_create(&threadIdArray[i],NULL,runner,NULL);
if(i==n-2)printf("breakdown imminent!\n");
if(i==n-1)printf("breakdown already occurred!\n");
}
Nothing stops this loop from executing until i == n-1 . pthread_create() just fires off a thread to be run. It doesn't wait for it to start or end. Thus you're at the mercy of the scheduler, which might decide to continue executing your loop, or switch to one of the newly created threads (or do both, on a SMP system).
You're also initalizing the barrier to n, so in any case none of the threads will get past the barrier until you've created all of them.
In addition to the answers of nos and Starkey you have to take into account that you have another serialization in your code that is often neglected: you are doing IO on the same FILE variable, namely stdin.
The access to that variable is mutexed internally and the order in which your n+1 threads (including your calling thread) get access to that mutex is implementation defined, take it basically as random in your case.
So the order in which you get your printf output is the order in which your threads pass through these wormholes.
You can get the expected order in one of two ways
Create each thread with a higher priority than the main thread. This will ensure that new thread will run immediately after creation and wait on the barrier.
Move the "breakdown imminent!\n" print before the pthread_create() and call use a sched_yield() call after every pthread_create(). This will schedule the newly created thread for execution.