threads inside a thread? - c

I wanna implement divide and conquer using pthread, but I don't know what will happen if I create more threads in a thread.
From my understanding, if the machine has a 2-core processor, it can only process 2 threads at the same time. If there are more than 2 threads, other threads have to wait for the resources, so if I create more and more threads while I'm going deeper, actually it may not increase the speed of the algorithm since only 2 threads can be processed at the same time.
I do some research online and it seems the threads at upper level can be inactive, only the ones at the deepest level stay active. How to achieve this? Also if an upper thread stays inactive, does it affect the lower thread?

There are two basic types: detached and joinable.
A joinable thread is one which you may wait for (or access the result of) termination using pthread_join.
Using more threads than there are cores can help or hurt -- depends on your program! It's often good to minimize or eliminate competition for resources with multithreading. Throwing too many threads at a program can actually slow the process down. However, you would likely have idle CPU time if the number of cores matches the thread count and one of the threads is waiting on disk IO (provided nothing significant is happening in other processes).
threads at upper level can be inactive, only the ones at the deepest level stay active. how to achieve this?
Using joinable threads, you can accomplish the nested thread approach you have outlined, and this is demonstrated in several tutorials. The basic flow is that a thread will create one or more workers, and wait for them to exit using pthread_join. However, alternatives such as tasks and thread pools are preferable in the majority of cases.
Nevertheless, it's unlikely that this approach is the best for execution, because it does not correlate (well) with hardware and scheduling operations, particularly as depth and width of your program grows.
if a upper thread stay inactive, won't affect the lower thread?
Yes. The typical problem, however, is that the work/threads are not constrained. Using the approach you have outlined, it's easy to spawn many threads and have an illogically high number of threads for the work which must be executed on a limited number of cores. Consequently, your program would waste a bunch of time context switching and waiting for threads to complete. Creating many threads can also waste/reserve a significant amount of resources, especially if they are short-lived and/or idle/waiting.
so if i create more and more threads while im going deeper, actually it may not increase the speed of the algorithm since only 2 threads can be processed at the same time.
Which suggests creating threads using this approach is flawed. You may want to create a few threads instead, and use a task based approach -- where each thread requests and executes tasks from a collection. Creating a thread takes a good bit of time and resources.

If you are trying to do a two-way divide and conquor, spawning two children and waiting for them to finish, you probably need something like:
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
pthread_create (left_child, NULL, routine, left_arg);
pthread_create (right_child, NULL, routine, right_arg);
/* wait for 'children' */
pthread_join (left_child, &left_return_val);
pthread_join (right_child, &right_return_val);
/* merge results & return */
}
A slight improvement would be this, where instead of sleeping, the 'parent thread' does the job of the right child synchronously, and spawns one less thread:
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
pthread_create (left_child, NULL, routine, left_arg);
/* do the right_child's work yourself */
right_return_val = routine (right_arg);
/* wait for 'left child' */
pthread_join (left_child, &left_return_val);
/* merge results & return */
}
However, when you go N levels deep, you have quite a few children. The speedup obtained really depends on how much time the CPU spends on real processing, and how much time it waits for I/O etc. If you know that on a machine with P cores, you can only get good speedup with, say kP threads, then instead of spawning threads as above, you could set up a 'worker pool' of kP threads, and keep reusing them. This way, once kP threads have been spawned, you won't spawn more:
THREAD_POOL pool = new_thread_pool (k * P); /* I made this function up */
void *
routine (void * argument)
{
/* divide */
left_arg = f (argument);
right_arg = f (argument);
/* conquor */
left_thread = get_worker (pool); /* Blocks until a thread is free */
/* get left_thread to do processing for you */
right_thread = get_worker (pool); /* Blocks until a thread is free */
/* get right_thread to do processing for you */
/* wait for 'children' */
pthread_join (left_child, &left_return_val);
pthread_join (right_child, &right_return_val);
/* return the workers */
put_worker (pool, left_thread);
put_worker (pool, right_thread);
/* merge results & return */
}

You should be able to create many more threads than you have cores in your system. The operating system will make sure that every thread gets part of the CPU to do its work.
However, there is [probably] an upper limit to the number of threads you can create (check your OS documentation).
So if you create 5 threads in a system with 2 cores, then every thread will get about 40% of the cpu (on average). It's not that a thread has to wait until another thread has completely finished. Unless you use locks of course.
When you use locks to protect data from being changed or accessed by multiple threads, a number of problems can popup. Typical problems are:
dead locks: thread 1 waits on something that is locked by thread 2; thread 2 waits on something that is locked by thread 1
lock convoy: multiple threads are all waiting on the same lock
priority inversion: thread 1 has priority on thread 2, but since thread 2 has a lock most of the time, thread 1 still has to wait on thread 2
I found this page (http://ashishkhandelwal.arkutil.com/index.php/csharp-c/issues-with-multithreaded-programming-part-1/), which could be a good start on multithreaded programming.

Related

Do any mechanisms, similar to pthread_barrier, exist that allow thread communication across functions?

I'm trying to find out if there exists some sort of barrier-like mechanism to prevent threads in one function from proceeding until some condition has been met.
I attempted to implement this by the following:
function1 (multiple threads are here):
{...
while(suspended){} /* runs forever while suspended = 1, could also do while(!__sync_bool_compare_and_swap(&suspended, 1, 0)) {} */
...}
function2(one thread is here):
{...updates some resource the function1 threads need
suspended = 0; // takes out the suspended factor, could also do atomically
while(!donewithwork){} // will be released upon end of work of func1
return;
}
This implementation typically doesn't work and causes a full program crash. Looking in gdb wasn't too enlightening, the global variable "suspended" is decremented to 0, but in both cases, all threads are stuck in the while loops.
Online, I found that there exists a "pthreads_cond" section of functions devoted to this, but after talking to some people, I found that the multi-lock and release ends up having the same cost (in terms of system calls) as the following implementation:
// init multi and one to 0, initial lock state
void* helper(void*)
{
sem_wait(multi); // wait for one to finish its work before starting
cnt++
...
cnt--;
if (cnt == 0) sem_post(one); // release one when the work is finished
}
compute_histogram(void*)
{
... initialize globals that multi will be using ...
for (all threads) {sem_post(multi)); // releases every waiting thread at multi
sem_wait(one); // forces one to wait to finish return until helper's work has finished
return;
}
This has significant performance increases, 6-8x, over the single-threaded version, but the system calls required by the semaphore have significant overhead.
I'm wondering if there exists a type of "pthread_barrier" type mechanism that would release the threads upon a condition being met (by some other thread).

Priority based multithreading?

I have written code for two threads where is one is assigned priority 20 (lower) and another on 10 (higher). Upon executing my code, 70% of the time I get expected results i.e high_prio (With priority 10) thread executes first and then low_prio (With priority 20).
Why is my code not able to get 100 % correct result in all the executions? Is there any conceptual mistake that I am doing?
void *low_prio(){
Something here;
}
void *high_prio(){
Something here;
}
int main(){
Thread with priority 10 calls high_prio;
Thread with priority 20 calls low_prio;
return 0;
}
Is there any conceptual mistake that I am doing?
Yes — you have an incorrect expectation regarding what thread priorities do. Thread priorities are not meant to force one thread to execute before another thread.
In fact, in a scenario where there is no CPU contention (i.e. where there are always at least as many CPU cores available as there are threads that currently want to execute), thread priorities will have no effect at all -- because there would be no benefit to forcing a low-priority thread not to run when there is a CPU core available for it to run on. In this no-contention scenario, all of the threads will get to run simultaneously and continuously for as long as they want to.
The only time thread priorities may make a difference is when there is CPU contention -- i.e. there are more threads that want to run than there are CPU cores available to run them. At that point, the OS's thread-scheduler has to make a decision about which thread will get to run and which thread will have to wait for a while. In this instance, thread priorities can be used to indicate to the scheduler which thread it should prefer allow to run.
Note that it's even more complicated than that, however -- for example, in your posted program, both of your threads are calling printf() rather a lot, and printf() invokes I/O, which means that the thread may be temporarily put to sleep while the I/O (e.g. to your Terminal window, or to a file if you have redirected stdout to file) completes. And while that thread is sleeping, the thread-scheduler can take advantage of the now-available CPU core to let another thread run, even if that other thread is of lower priority. Later, when the I/O operation completes, your high-priority thread will be re-awoken and re-assigned to a CPU core (possibly "bumping" a low-priority thread off of that core in order to get it).
Note that inconsistent results are normal for multithreaded programs -- threads are inherently non-deterministic, since their execution patterns are determined by the thread-scheduler's decisions, which in turn are determined by lots of factors (e.g. what other programs are running on the computer at the time, the system clock's granularity, etc).

How implement a barrier using semaphores

I have the following problem to solve:
Consider an application where there are three types of threads: Calculus-A,Calculus-B and Finalization. Whenever a thread type Calculus-A ends, it calls the routine endA(), which returns immediately. Whenever a thread type Calculus-B ends, it calls the routine endB(), which returns immediately. Threads like Finalization routine call wait(),
which returns only if they have already completed two Calculation-A threads and 2 Calculation-B threads. In other words, for exactly 2 conclusions of Calculus-A and 2 conclusions of Calculus-B one thread Finalization is allowed to continue.
There is an undetermined number of threads of the 3 types. It is not known the order of the routines called by threads. Threads Completion are answered in the order of arrival.
Implement routines endA(), endB() and wait() using semaphores. Besides the variables initialization, the only possible operations are P and V. Solutions with busy-waiting are not acceptable.
Here's is my solution:
semaphore calcA = 2;
semaphore calcB = 2;
semaphore wait = -3;
void endA()
{
P(calcA);
V(wait);
}
void endB()
{
P(calcB);
V(wait);
}
void wait()
{
P(wait);
P(wait);
P(wait);
P(wait);
V(calcA);
V(calcA);
V(calcB);
V(calcB);
}
I believe that there will be a deadlock due to the wait's initialization and if and wait() executes before endA() and endB(). Is there any other solution for this?
I tend to view semaphore problems as problems where one must identify "sources of waiting" and define for each a semaphore and a protocol for their access.
With that in mind, the "sources of waiting" are
Completions of CalcA
Completions of CalcB
Maybe, if I understood this right, a wait on whole completion groups, consisting of two CalcAs and two CalcBs. I say maybe because I'm not sure what "Threads Completion are answered in the order of arrival." means.
Completions of CalcA and CalcB should therefore increment their respective counters. At the other end, one Finalization thread gains exclusive access to the counters and waits in any order for the needed number of completions to constitute a completion group. It then unlocks access to the next group.
My code is below, although since I'm unfamiliar with the Dutch V and P I will use take()/give().
semaphore calcA = 0;
semaphore calcB = 0;
semaphore groupSem = 1;
void endA(){
give(calcA);
}
void endB(){
give(calcB);
}
void wait(){
take(groupSem);
take(calcA);
take(calcA);
take(calcB);
take(calcB);
give(groupSem);
}
The groupSem semaphore ensures all-or-nothing: the thread that enters the critical section will get the next two completions of each of CalcA and CalcB. If groupSem wasn't there, the first thread to enter wait could take two As and block, then be taken over by another thread that grabs two As and two B and then run away.
A worse problem that exists if the groupSem isn't there is if this second thread takes two As, one B and then blocks, and then the first thread grabs the second B. If somehow the result of the finalization allows more runs of CalculationA and CalculationB, then you may have a deadlock, because there may be no more opportunity for instances of calculation A and B to complete, therefore leaving the finalization threads hanging, unable to produce more calculation instances.

Managing a variable number of worker threads with graceful exit

I have a boss thread that spawns up to M worker threads. Over the lifetime of the program, workers may be added and removed. When the program-wide shutdown flag is signalled, I want to await the completion of these workers.
Currently, any of the threads can add/remove threads, but it's strictly not a requirement as long as any thread can initiate a spawn/removal.
What's stopping me from using a counting semaphore or pthread_barrier_wait() is that it expects a fixed number of threads.
I can't loop pthread_join() over all workers either because I'd risk leaking zombie threads that have exited and possibly since then been replaced.
The boss thread itself has no other purpose than spawning the threads initially and making sure that the process exits gracefully.
I've spent days on and off on this problem and cannot come up with something robust and simple; are there any fairly well-established ways to accomplish this with POSIX threads?
1) "Currently, any of the threads can add/remove threads"
and
2) "are there any fairly well-established ways to accomplish this with POSIX threads"
Yes. Don't do (1). Have the boss thread do it.
Or, you can protect the code which spawns threads with a critical section or mutex (I assume you are already doing this). They should check a flag to see if shutdown is in progress, and if it is, don't spawn any more threads.
You can also have a counter of "ideal number of threads" and "actual number of threads" and have threads suicide if they find "ideal > actual". (I.e. they should decrement actual, exit the critical section/mutex, then quit).
When you need to initiate shutdown, use the SAME mutex/section to set the flag. Once done, you know the number of threads cannot increase, so you can use the most recent value.
Indeed, to exit you can just have the boss thread set "ideal" to zero, exit the mutex, and repeatedly sleep 10ms and repeat until all threads have exited. Worst case is you wait an extra 10ms to quit. If that's too much cut it to 1ms.
These are just ideas. The central concept is that all thread creation/removal, and messages about thread creation/removal should be protected by a mutex to ensure that only one thread is adding/removing/querying status at a time. Once you have that in place, there is more than one way to do it...
Threads that want to initiate spawns/removals should ask the boss thread to actually do it for them. Then the boss thread doesn't have to worry about threads it doesn't know about, and you can use one of the simple methods you described in your question.
I'll take the opposite tac as some of the other answers since I have to do this now and again.
(1) Give every spawned thread access to a single pipe file descriptor either through the data passed through pthread_create or globally. Only the boss thread reads the pipe. Each thread announces its creation and termination to the boss via the pipe by passing its tid and boss adds or removes it from its list and pthread_joins it as appropriate. Boss can block on the pipe w/o having to do anything special.
(2) Do more or less the above with some other mechanism. Global ctr and list with accompanying condition variable to wake up boss; a message queue, etc.

High performance computing pthread parameters - c++

I'm working on a project where I need to use multiple threads using pthread (C++).
I have a question:
What is the best pthread parameter configuration setting for when I want to do some high performance computing in that thread without too much other threads interrupting it?
Currently I'm using something like this:
pthread_t thread;
struct sched_param param;
int sched = SCHED_FIFO;
memset(&param, 0, sizeof(sched_param));
// Now I set priority to max value (sched_get_priority_max() returns that value)
param.sched_priority = sched_get_priority_max();
// Create my thread
pthread_create(&thread, NULL, (void *) &hard_number_crunching, (void *) &structure_passed_to_thread);
// finally I set parameters to thread
pthread_setschedparam(&thread, sched, &param);
I was switching "int sched" between SCHED_FIFO and SCHED_RR, but it didn't help me much.
Is it possible to force this thread to run longer on the CPU than it is at the moment?
If you are creating one thread per core, you probably want to set the thread's affinity to prevent it from roaming between cores. This usually improves the performance by ensuring that each thread remains close to its cached data. See:
int sched_setaffinity(pid_t pid,size_t cpusetsize,cpu_set_t *mask);
Note: you should not set the affinity if you are creating more threads than cores! This could cause all kind of crazy things to happen, deadlocks to mention one.
Let's say your thread is going to take X cycles. It doesn't matter what the priority is, or what else the system is doing, it's still going to take X cycles. If the system is not busy doing other things, your thread will run nearly continuously until it finishes. If the system is busy, your thread will be stopped, perhaps even frequently, to let other things run. Eventually your thread will still get its X cycles. Note that for your purposes here, there are two types of priorities... thread priorities (the priority of your one thread versus the priority of your other threads), and process priorities (the priority of your process/program versus the priority of other processes/programs/operating-system). You're setting the thread priority. Do you have other threads of equal or higher priority competing with this one? Are there system processes of higher priority competing with yours? If the answer is no, you'll pretty much get the CPU to yourself. Also note, if you have multiple processors and/or cores, your thread may not run well on more than one, so your overall system processing may not be 100% if you haven't divided up your tasks well.
Both zr. and mark answers helped me.
I also found a way to set affinity for individual thread.
Check this link to see how: pthread_attr_setaffinity_np()
Thanks everyone who helped.

Resources