Is there a way to reuse pthreads? - c

I have a function that is called millions of times, and the work done by this function is multithreaded. Here is the function:
void functionCalledSoManyTimes()
{
for (int i = 0; i < NUM_OF_THREADS; i++)
{
pthread_create(&threads[i], &attr, thread_work_function, (void *)&thread_data[i]);
}
// wait
}
I'm creating the threads each time the function is called, and I give each thread its data struct (that's been set once at the beginning of the algorithm) to use in the thread_work_function. The thread_work_functionsimply processes a series of arrays, and the thread_data struct contains pointers to those arrays and the indices that each thread is responsible for.
Although multithreading the algorithm in this way did improve the performance by more than 20%, my profiling shows that the repetitive calls to pthread_create are causing a significant overhead.
My question is: Is there a way to achieve my goal without calling pthread_create each time the function is called?
Problem Solved.
Thank you guys, I really appreciate your help! I've written a solution here using your tips.

Just start a fixed set of threads and use an inter-thread communication system (ring buffer, for instance) to pass the data to process.

Solving the problem gracefully is not so easy. You can use static storage for a thread pool, but then what happens if functionCalledSoManyTimes itself can be called from multiple threads? It's not a good design.
What I would do to handle this sort of situation is create a thread-local storage key with pthread_key_create on the first call (using pthread_once), and store your thread-pool there with pthread_setspecific the first time functionCalledSoManyTimes gets called in a given thread. You can provide a destructor function to pthread_key_create which will get called when the the thread exists, and this function can then be responsible for signaling the worker threads in the thread pool to terminate themselves (via pthread_cancel or some other mechanism).

Related

Static vs Dynamic pthread creation

I have a question in regards to creating threads.
Specifically I want to know the difference between looping through thread[i]
and not looping but recalling pthread_create
For Example
A. Initializes 5 threads
for(i=0,i<5;i++){
pthread_create(&t[i],NULL,&routine,NULL);
}
B. Incoming clients connecting to a server
while(true){
client_connects_to_server = accept(sock, (struct sockaddr *)&server,
(socklen_t*)&server_len)
pthread_create(&t,NULL,&routine,NULL); //no iteration
}
Is the proper method of creating threads for incoming clients, to keep track of the connections already made, maybe something like this ?
pthread_create(&t[connections_made+1],&routine,NULL)
My concern is not being able to handle concurrent pthreads if option B is terminating threads or "re-writing" client connections.
Here is an example where no iteration is done
https://gist.github.com/oleksiiBobko/43d33b3c25c03bcc9b2b
Why is this correct ?
Contrary to your apparent assertion, both your examples call pthread_create() inside loops. In example A, it is a for loop that will iterate a known number of times, whereas in example B, it is a while loop that will iterate an unbounded number of times. I guess the known number vs unbounded number is what you mean by "static" and "dynamic", but that is not a conventional usage of those terms.
In any event, pthread_create() does what it is documented to do. Like any other function, it does not know anything about the context from which it is called other than the arguments passed to it. It can fail, but that's not influenced by looping by the caller, at least not directly. When pthread_create() succeeds, it creates and starts a new thread, which runs until the top-level call to its thread function returns, pthread_exit() is called by thread, the thread is canceled, or the process terminates.
The main significant difference between your two examples is that A keeps all the thread IDs by recording them in different elements of an array, whereas B overwrites the previous thread ID each time it creates a new thread. But the thread IDs are not the threads themselves. If you lose the ID of a thread then you can no longer join it, among other things, but that doesn't affect the thread's operation, including its interactions with memory, with files, or with synchronization objects such as semaphores. In this regard, example B is more suited for a thread function that will detach the thread in which it is called, so that the joining issue is moot. Example A's careful preservation of all the thread IDs would pointless for threads that detach themselves, but necessary if the threads need to be joined later.

Recursive Multithreading in C

I'm creating a function that searches through a directory, prints out files, and when it runs into a folder, a new thread is created to run through that folder and do the same thing.
It makes sense to me to use recursion then as follows:
pthread_t tid[500];
int i = 0;
void *search(void *dir)
{
struct dirent *dp;
DIR *df;
df = opendir(dir)
char curFile[100];
while ((dp = readdir(df)) != NULL)
{
sprintf(curFile, "%s/%s",dir,dp->d_name);
if(isADirectory(curFile))
{
pthread_create(&tid[i], NULL, &search, &curFile);
i++;
}
else
{
printf("%s\n", curFile);
}
}
pthread_join(&tid[i])
return 0;
}
When I do this, however, the function ends up trying to access directories that don't actually exist. Initially I had pthread_join() directly after pthread_create(), which worked, but I don't know if you can count that as multithreading since each thread waits for its worker thread to exit before doing anything.
Is the recursive aspect of this problem even possible, or is it necessary for a new thread to call a different function other than itself?
I haven't dealt with multithreading in a while but if memory serves threads share resources. Which means (in your example) every new thread you make accesses the same variable "i". Now if those threads only read variable "i" there would be no problem whatsoever (every thread keeps reading ... i = 2 wohoo :D).
But issues arise when threads share resources that are being read and written on.
i = 2
i++
// there are many threads running this code
// and "i" is shared among them, are you sure i = 3?
Read, write on shared resources problem is solved with thread synchronization. I recommend reading/googling upon it since it's a pretty unique topic to be solved in one question.
P.S. I pointed out variable "i" in your code but there may be more such resources since your code doesn't display any attempt at thread synchronization.
Consider your while loop. Inside it you have:
sprintf(curFile, "%s/%s",dir,dp->d_name);
and
pthread_create(&tid[i], NULL, &search, &curFile);
So, you mutate the contents of curFile inside the loop, and you also create a thread which you are trying to pass the current contents of curFile. This is a spectacular race hazard - there is no guarantee that the new thread will see the intended contents of curFile, since it may have changed in the meantime. You need to duplicate the string and pass the new thread a copy which won't be mutated by the calling thread. The thread is therefore also going to have be responsible for deallocating the copy, which means either that the search method do exactly that or that you have a second method.
You have another race condition in using i and tid in all threads. As I have suggested in the comment on your question, I think these variables should be method local.
In general I suggest that you read on thread safety and learn about data race hazards before you attempt to use threads. It is usually best to avoid the use of threads unless you really need the extra performance.

Can two Threads use same Thread Procedure?

Is it possible for two threads to use a single function "ThreadProc" as its thread procedure when CreateThread() is used?
HANDLE thread1= CreateThread( NULL, //Choose default security
0, //Default stack size
(LPTHREAD_START_ROUTINE)&ThreadProc,
//Routine to execute. I want this routine to be different each time as I want each thread to perform a different functionality.
(LPVOID) &i, //Thread parameter
0, //Immediately run the thread
&dwThreadId //Thread Id
)
HANDLE thread2= CreateThread( NULL, //Choose default security
0, //Default stack size
(LPTHREAD_START_ROUTINE)&ThreadProc,
//Routine to execute. I want this routine to be different each time as I want each thread to perform a different functionality.
(LPVOID) &i, //Thread parameter
0, //Immediately run the thread
&dwThreadId //Thread Id
)
Would the above code create two threads each with same functionality(since thread procedure for both of the threads is same.) Am I doing it correctly?
If it is possible then would there be any synchronization issues since both threads are using same Thread Procedure.
Please help me with this. I am really confused and could not find anything over the internet.
It is fine to use the same function as a thread entry point for multiple threads.
However, from the posted code the address of i is being passed to both threads. If either thread modifies this memory and the other reads then there is a race condition on i. Without seeing the declaration of i it is probably a local variable. This is dangerous as the threads require that i exist for their lifetime. If i does not the threads will have a dangling pointer. It is common practice to dynamically allocate thread arguments and have each thread free its arguments.
Yes, it is very well possible to have multiple (concurrent) threads that start with the same entry point.
Apart from the fact that the OS/threading library specifies the signature and calls it, there is nothing special about a thread entry point function. It can be used to start off multiple threads with the same caveats as for calling any other function from multiple threads: you need synchronization to access non-atomic shared variables.
Each thread uses its own stack area, but that gets allocated by the OS before the Thread Procedure get invoked, so by the time the Thread Procedure gets called all the special actions that are needed to create and start a new thread have already taken place.
Whether the threads are using the same code or not is irrelevant. It has no effect whatsoever on synchronization. It behaves precisely the same as if they were different functions. The issues with potential races is the same.
You probably don't want to pass both threads the same pointers. That will likely lead to data races. (Though we'd have to see the code to know for sure.)
Your code is right. There is NOT any synchronization issues between both threads. If they need synchronization, it maybe because they are change the same global variable, not because they use the same thread Procedure.

Multi threading and deadlock

I am making a multi-threaded C program which involves the sharing of a global dynamic integer array between two threads. One thread will keep adding elements to it & the other will independently scan the array & free the scanned elements.
can any one suggest me the way how can I do that because what I am doing is creating deadlock
Please also can any one provide the code for it or a way to resolve this deadlock with full explanation
For the threads I would use pthread. Compile it with -pthread.
#include <pthread.h>
int *array;
// return and argument should be `void *` for pthread
void *addfunction(void *p) {
// add to array
}
// same with this thread
void *scanfunction(void *p) {
// scan in array
}
int main(void) {
// pthread_t variable needed for pthread
pthread_t addfunction_t, scanfunction_t; // names are not important but use the same for pthread_create() and pthread_join()
// start the threads
pthread_create(&addfunction_t, NULL, addfunction, NULL); // the third argument is the function you want to call in this case addfunction()
pthread_create(&scanfunction_t, NULL, scanfunction, NULL); // same for scanfunction()
// wait until the threads are finish leave out to continue while threads are running
pthread_join(addfunction_t, NULL);
pthread_join(scanfunction_t, NULL);
// code after pthread_join will executed if threads aren't running anymore
}
Here is a good example/tutorial for pthread: *klick*
In cases like this, you need to look at the frequency and loading generated by each operation on the array. For instance, if the array is being scanned continually, but only added to once an hour, its worth while finding a really slow, latency-ridden write mechanism that eliminates the need for read locks. Locking up every access with a mutex would be very unsatisfactory in such a case.
Without details of the 'scan' operation, especially duration and frequency, it's not possible to suggest a thread communication strategy for good performance.
Anohter thing ee don't know are consequences of failure - it may not matter if a new addition is queued up for a while before actually being inserted, or it may.
If you want a 'Computer Science 101' answer with, quite possibly, very poor performance, lock up every access to the array with a mutex.
http://www.liblfds.org
Release 6 contains a lock-free queue.
Compiles out of the box for Windows and Linux.

C linux pthread thread priority

My program has one background thread that fills and swaps the back buffer of a double buffer implementation.
The main thread uses the front buffer to send out data. The problem is the main thread gets more processing time on average when I run the program. I want the opposite behavior since filling the back buffer is a more time consuming process then processing and sending out data to the client.
How can I achieve this with C POSIX pthreads on Linux?
In my experience, if, in the absence of prioritisation your main thread is getting more CPU then this means one of two things:
it actually needs the extra time, contrary to your expectation, or
the background thread is being starved, perhaps due to lock contention
Changing the priorities will not fix either of those.
have a look at pthread_setschedparam() --> http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_setschedparam.3.html
pthread_setschedparam(pthread_t thread, int policy,
const struct sched_param *param);
You can set the priority in the sched_priority field of the sched_param.
Use pthread_setschedprio(pthread_t thread, int priority). But as in other cases (setschedparam or when using pthread_attr_t) your process should be started under root, if you want to change priorities (like nice utility).
You should have a look at the pthread_attr_t struct. It's passed as a parameter to the pthread_create function. It's used to change the thread attributes and can help you to solve your problem.
If you can't solve it you will have to use a mutex to prevent your main thread to access your buffer before your other thread swaps it.

Resources