Pthread Join in C? - c

I'm writing code to save text to a binary file, which includes a function to auto-save text to the binary file, as well as a function to print from the binary file, and I need to incorporate pthread locks and join. We were given
pthread_mutext_t mutex;
pthread_t autosavethread;
as global variables, although the instructor didn't talk about what pthread or mutex actually do, so I'm confused about that.
Also, I understand that I need to use locks whenever shared variables are changed or read (in my case it would be the binary file). But at the end of the file I am supposed to use pthread_join, and I don't know what it does or what arguments are supposed to be used in it. I'm guessing mutex and autosavethread are supposed to be closed, or something along the lines of that, but I don't know how to write it. Can anyone help better my understanding?

There are two types of pthread - joinable thread & detached thread.
If you want to let a thread just take a task and go away once the task is done, you need the detached thread;
If you want to have the communication with the created thread when that thread is done with the assigned job, you have to use joinable thread. Basically it's needed when the parent & its created thread need to communicate after the thread is done.
It's very to google what exactly you need to call the pthread APIs and what can be communicated.
But one thing i want to mention here is, for the joinable thread, you have to explicitly call the pthread_join against the created thread. Otherwise, there will be serious memory leaks. When the joinable thread completes its task, the thread seems to exit (On linux, you can check the /proc/PID/task/ folder and once the thread completes, the entry under it will go away), but the resource allocated for this joinable thread, i.e. stack, is still there in the process memory space. As more and more joinable threads created and completing their tasks, the stacks for each thread are just left in process space, unless you explicitly call the pthread_join. Hope that helps, even a bit

Related

What happens if we don't join threads? [duplicate]

I create more than 100 threads from my main() so I just wanted to know that do I need to call pthread_join() before I exit my main().
Also, I do not need the data generated by these threads, basically, all the threads are doing some job independent from main() and other threads.
pthread_join does two things:
Wait for the thread to finish.
Clean up any resources associated with the thread.
If you exit the process without joining, then (2) will be done for you by the OS (although it won't do thread cancellation cleanup, just nuke the thread from orbit), and (1) will not. So whether you need to call pthread_join depends whether you need (1) to happen.
If you don't need the thread to run, then as everyone else is saying you may as well detach it. A detached thread cannot be joined (so you can't wait on its completion), but its resources are freed automatically if it does complete.
Yes if thread is attachable then pthread_join is must otherwise it creates a Zombie thread.
Agree with answers above, just sharing a note from man page of pthread_join.
NOTES
After a successful call to pthread_join(), the caller is guaranteed that the target thread has terminated.
Joining with a thread that has previously been joined results in undefined behavior.
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when
enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
When you exit, you do not need to join because all other threads and resources will be automatically cleaned up. This assumes that you actually want all the threads to be killed when main exits.
If you don't need to join with a thread, you can create it as a "detached" thread by using pthread_attr_setdetachstate on the attributes before creating the thread. Detached threads cannot be joined, but they don't need to be joined either.
So,
If you want all threads to complete before the program finishes, joining from the main thread makes this work.
As an alternative, you can create the threads as detached, and return from main after all threads exit, coordinating using a semaphore or mutex+condition variable.
If you don't need all threads to complete, simply return from main. All other threads will be destroyed. You may also create the threads as detached threads, which may reduce resource consumption.
By default threads in pthreads library are created as joinable.
Threads may, however, detach, rendering them no longer joinable. Because threads consume system resources until joined, just as processes consume resources until their parent calls wait(), threads that you do not intend to join must be detached, which is a good programming practice.
Of course once the main routine exits, all threading resources are freed.
If we fail to do that(detaching), then, when the thread terminates it produces the thread equivalent of a zombie process. Aside from wasting system resources, if enough thread zombies accumulate, we won't be able to create additional threads.
Per default a thread runs attached, that means the resources it needs are kept in use until the thread is joined.
As from your description noone but the thread itself needs the thread's resources, so you might create the thread detached or detach the thread prior to having it started.
To detach a thread after its creation call pthread_detach().
Anyhow if you want to make sure all threads are gone before the program ends, you should run the threads attached and join them before leaving the main thread (the program).
If you want to be sure that your thread have actually finished, you want to call pthread_join.
If you don't, then terminating your program will terminate all the unfinished thread abruptly.
That said, your main can wait a sufficiently long time until it exits. But then, how can you be sure that it is suffucient?
If your main ends your application ends and your threads die... So you do need to use thread join (or use fork instead).

Use of pthread_join()

I am wondering, what can happen if we do a pthread_create without a pthread_join?
Who will "clean" all the memory of the "non-joined" thread.
When the process terminates, all resources associated with the process cease to exist. (This of course does not include shared resources the process created, like files in the filesystem, shared memory segments, etc.) Until then, unjoined threads will continue to consume resources, potentially calling future calls to pthread_create or even malloc to fail.
Well, assuming that it's an app-lifetime thread that does not need or try to explicitly terminate, the OS will do it when its process is terminated, (on all non-trivial OS).
If the thread is created without using pthread_join then when the main thread completes execution all other threads created in main function will be stopped and hence will not complete executing the whole statements in it.
Look at the documentation of Pthread_join.
It will make the main thread to suspend until the spawned thread completes execution.

c - pthread_create identifier

The first argument of pthread_create() is a thread object which is used to identify the newly-created thread. However, I'm not sure I completely understand the implacations of this.
For instance, I am writing a simple chat server and I plan on using threads. Threads will be coming and going at all times, so keeping track of thread objects could be complicated. However, I don't think I should need to identify individual threads. Could I simply use the same thread object for the first argument of pthread_create() over and over again, or are there other ramifications for this?
If you throw away the thread identifiers by overwriting the same variable with the ID of each thread you create, you'll not be able to use pthread_join() to collect the exit status of threads. So, you may as well make the threads detached (non-joinable) when you call pthread_create().
If you don't make the threads detached, exiting threads will continue to use some resource, so continually creating attached (non-detached) threads that exit will use up system resources — a memory leak.
Read the manual at http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_create.html
According to it:
"Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread."
I think pthread_create just overwrites the value in the first argument. It does not read it, doesn't care what is inside it. So you can get a new thread from pthread_create, but you can't make it reuse an existing thread. If you would like to reuse your threads, that is more complicated.

is it necessary to call pthread_join()

I create more than 100 threads from my main() so I just wanted to know that do I need to call pthread_join() before I exit my main().
Also, I do not need the data generated by these threads, basically, all the threads are doing some job independent from main() and other threads.
pthread_join does two things:
Wait for the thread to finish.
Clean up any resources associated with the thread.
If you exit the process without joining, then (2) will be done for you by the OS (although it won't do thread cancellation cleanup, just nuke the thread from orbit), and (1) will not. So whether you need to call pthread_join depends whether you need (1) to happen.
If you don't need the thread to run, then as everyone else is saying you may as well detach it. A detached thread cannot be joined (so you can't wait on its completion), but its resources are freed automatically if it does complete.
Yes if thread is attachable then pthread_join is must otherwise it creates a Zombie thread.
Agree with answers above, just sharing a note from man page of pthread_join.
NOTES
After a successful call to pthread_join(), the caller is guaranteed that the target thread has terminated.
Joining with a thread that has previously been joined results in undefined behavior.
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when
enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
When you exit, you do not need to join because all other threads and resources will be automatically cleaned up. This assumes that you actually want all the threads to be killed when main exits.
If you don't need to join with a thread, you can create it as a "detached" thread by using pthread_attr_setdetachstate on the attributes before creating the thread. Detached threads cannot be joined, but they don't need to be joined either.
So,
If you want all threads to complete before the program finishes, joining from the main thread makes this work.
As an alternative, you can create the threads as detached, and return from main after all threads exit, coordinating using a semaphore or mutex+condition variable.
If you don't need all threads to complete, simply return from main. All other threads will be destroyed. You may also create the threads as detached threads, which may reduce resource consumption.
By default threads in pthreads library are created as joinable.
Threads may, however, detach, rendering them no longer joinable. Because threads consume system resources until joined, just as processes consume resources until their parent calls wait(), threads that you do not intend to join must be detached, which is a good programming practice.
Of course once the main routine exits, all threading resources are freed.
If we fail to do that(detaching), then, when the thread terminates it produces the thread equivalent of a zombie process. Aside from wasting system resources, if enough thread zombies accumulate, we won't be able to create additional threads.
Per default a thread runs attached, that means the resources it needs are kept in use until the thread is joined.
As from your description noone but the thread itself needs the thread's resources, so you might create the thread detached or detach the thread prior to having it started.
To detach a thread after its creation call pthread_detach().
Anyhow if you want to make sure all threads are gone before the program ends, you should run the threads attached and join them before leaving the main thread (the program).
If you want to be sure that your thread have actually finished, you want to call pthread_join.
If you don't, then terminating your program will terminate all the unfinished thread abruptly.
That said, your main can wait a sufficiently long time until it exits. But then, how can you be sure that it is suffucient?
If your main ends your application ends and your threads die... So you do need to use thread join (or use fork instead).

Recreate dead threads after a fork

As you might know, all threads in the application die in a forked process, other than the thread doing the fork. However, I plan to ressurrect those threads in the forked process by calling pthread_create and using pthread_attr_setstack, so as to assign the newly created threads the same stack as the dead threads. Something like as follows.
// stackAddr and stacksize taken from the dead thread
pthread_attr_setstack(&attr, stackAddr, stacksize);
rc = pthread_create(&thread, &attr, threadRoutine, NULL);
However, I would still need to get the CPU register values, such as stack pointer, base pointer, instruction pointer etc, to restart threads from the same point. How can I do that? And what else do I need to do to successfully achieve my goal?
Also note that I'm using a 64-bit architecture. What additional difficulties would it have as compared to 32-bit one?
I see two possible ways to shoot yourself in the foot and lose hair^W^W^W^W^W^W^W^Wtry to do this:
Try to force each thread into calling getcontext() before the fork(), and then restore the context of each thread via setcontext(). Probably won't work, but you can try for fun.
Save ptrace(PTRACE_GETREGS), ptrace(PTRACE_GETFPREGS), and restore with ptrace(PTRACE_SETREGS), ptrace(PTRACE_SETFPREGS).
The other threads in the current process aren't killed by a fork -- they're still there and running in the parent. The problem you seem to have is that fork only forks a SINGLE thread in the current procces, creating a new process running one thread with a copy of all non-thread resources in the parent.
What you apparently want is a way of duplicating an entire multithreaded task, forking all the threads in it and creating a new process/task with the same number of threads.
In order to do THAT, you would need to find and pause all the other threads in the process, dump their current state (including all locks they hold), fork a new process, and then (re)create each of those other threads in the child, rewiring the lock state to refer to the new child threads where needed.
Unfortunately, the POSIX pthread interface is hopelessly underspecified, and provides no way of doing that. In particular, it lacks any sort of reflective interface allowing you to figure out what threads are actually running.
If you want to try to do this anyway, I can see two ways of trying to approach this:
poke around in /proc/self/task to figure out what threads are running in your process, effectively getting that reflective interface in a highly non-portable way. You'll likely end up having to ptrace(2) the other threads to get their internal state. This will be very difficult.
wrap the pthreads library -- instead of using library directly, intercept every call and keep track of all the threads/mutexes/locks that get created, so that you have that information available when you want to fork. This will work fine as long as you don't want to use any third-party libraries that use pthreads
The second option is much easier (and somewhat portable), but only works well if you have access to all the source code of your entire application, and can modify it to use your wrappers properly.
Just googling around I found that solaris has a forkall() call that does exactly what you want, see the documentation here:
http://download.oracle.com/docs/cd/E19963-01/html/821-1601/gen-1.html
I assume you're running on linux, but it is possible to run solaris on x86 hardware. So maybe that is an option for you.

Resources