I'm curious how a single NPTL thread exits, from implementation perspective.
What I understand about glibc-2.30's implementation are:
NPTL thread is built on top of light weight process on Linux, with additional information stored in pthread object on user stack, to keep track of NPTL specific information such as join/detach status and returned object's pointer.
when a NPTL thread is finished, it is gone for good, only the user stack (and hence) pthread object is left to be collected (to be joined by other threads), unless it is a detached, in which case that space is directly freed.
_exit() syscall kills all threads in a thread group.
the user function that pthread_create() takes in is actually wrapped into another function start_thread(), which does some preparation before running the user function, and some cleaning up afterwards.
Questions are:
At the end of the wrapper function start_thread(), there are the following comment and code:
/* We cannot call '_exit' here. '_exit' will terminate the process.
The 'exit' implementation in the kernel will signal when the
process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
flag. The 'tid' field in the TCB will be set to zero.
The exit code is zero since in case all threads exit by calling
'pthread_exit' the exit status must be 0 (zero). */
__exit_thread ();
but __exit_thread() seems to do syscall _exit() anyways:
static inline void __attribute__ ((noreturn, always_inline, unused))
__exit_thread (void)
{
/* some comments here */
while (1)
{
INTERNAL_SYSCALL_DECL (err);
INTERNAL_SYSCALL (exit, err, 1, 0);
}
}
so I'm confused here, since it shouldn't really do syscall _exit() because it will terminate all threads.
pthread_exit() should terminate a single thread, so it should do something similar to what the wrapper start_thread() does in the end, however it calls __do_cancel(), and TBH I am lost in tracing down that function. It does not seem to be related to the above __exit_thread(), nor does it call _exit().
I'm confused here, since it shouldn't really do syscall _exit()
The confusion here stems from mixing exit system call with _exit libc routine (there is no _exit system call on Linux).
The former terminates current Linux thread (as intended).
The latter (confusingly) doesn't execute exit system call. Rather, it executes exit_group system call, which terminates all threads.
thread_exit() should terminate a single thread
It does, indirectly. It unwinds current stack (similar to siglongjmp), performing control transfer to the point where cleanup_jmp_buf was set up. And that was in start_thread.
After the control transfer, start_thread cleans up resources, and calls __exit_thread to actually terminate the thread.
Related
When our UNIX/C program needs an emergency exit, we use exit (3) function and install atexit (3) handlers for emergency clean-ups. This approach worked fine until our application got threaded, at which point atexit() handlers stopped to work predictably.
We learned by trial an error that threads may already be dead in atexit() handler, and their stacks deallocated.
I failed to find a quote in the standard linking thread disappearance with atexit(): threads cease to exist after return from main(), but is it before invocation of atexit() or after? What's the actual practice on Linux, FreeBSD and Mac?
Is there a good pattern for emergency cleanup in a multi-threaded program?
Posix Standard
It doesn't seem to be defined by Posix whether atexit handlers are called before or after threads are terminated by exit.
There are two (or three) ways for a process to terminate "normally".
All threads terminate. When the last thread exits, either by returning or calling pthread_exit, atexit handlers are run. In this case there are no other threads. (This is platform dependent. Some platforms may terminate other threads if the main thread terminates other than by exit, others do not).
One thread calls exit. In this case, atexit handlers will be run and all threads terminated. Posix doesn't specify in what order.
main returns. This is more-or-less equivalent to calling exit() as the last line of main, so can be treated as above.
OS Practice
In Linux, the documentation https://linux.die.net/man/2/exit
says threads are terminated by _exit calling exit_group, and that _exit is called after atexit handlers. Therefore in Linux on calling exit any atexit handlers are run before threads are terminated. Note that they are run on the thread calling exit, not the thread that called atexit.
On Windows the behaviour is the same, if you care.
Patterns for emergency cleanup.
The best pattern is: Never be in a state which requires emergency cleanup.
There is no guarantee that your cleanup will run because
you could have a kill -9 or
a power outage.
Therefore you need to be able to recover in that scenario.
If you can recover from a that, you can also recover from abort, so you can use abort for your emergency exit.
If you can't do that, or if you have "nice-to-have" cleanup you want to do, atexit handlers should be fine provided you first gracefully stop all threads in the process to prevent entering an inconsistent state while doing cleanup.
i am using posix threads my question is as to whether or not a thread can cancel itself by passing its own thread id in pthread_cancel function?
if yes then what are its implications
also if a main program creates two threads and one of the thread cancels the other thread then what happens to the return value and the resources of the cancelled thread
and how to know from main program as to which thread was cancelled ..since main program is not cancelling any of the threads
i am using asynchronous cancellation
kindly help
Q1: Yes, a thread can cancel itself. However, doing so has all of the negative consequences of cancellation in general; you probably want to use pthread_exit instead, which is somewhat more predictable.
Q2: When a thread has been cancelled, it doesn't get to generate a return value; instead, pthread_join will put the special value PTHREAD_CANCELED in the location pointed to by its retval argument. Unfortunately, you have to know by some other means that a specific thread has definitely terminated (in some fashion) before you call pthread_join, or the calling thread will block forever. There is no portable equivalent of waitpid(..., WNOHANG) nor of waitpid(-1, ...). (The manpage says "If you believe you need this functionality, you probably need to rethink your application design" which makes me want to punch someone in the face.)
Q2a: It depends what you mean by "resources of the thread". The thread control block and stack will be deallocated. All destructors registered with pthread_cleanup_push or pthread_key_create will be executed (on the thread, before it terminates); some runtimes also execute C++ class destructors for objects on the stack. It is the application programmer's responsibility to make sure that all resources owned by the thread are covered by one of these mechanisms. Note that some of these mechanisms have inherent race conditions; for instance, it is impossible to open a file and push a cleanup that closes it as an atomic action, so there is a window where cancellation can leak the open file. (Do not think this can be worked around by pushing the cleanup before opening the file, because a common implementation of deferred cancels is to check for them whenever a system call returns, i.e. exactly timed to hit the tiny gap between the OS writing the file descriptor number to the return-value register, and the calling function copying that register to the memory location where the cleanup expects it to be.)
Qi: you didn't ask this, but you should be aware that a thread with asynchronous cancellation enabled is officially not allowed to do anything other than pure computation. The behavior is undefined if it calls any library function other than pthread_cancel, pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED), or pthread_setcancelstate(PTHREAD_CANCEL_DISABLE).
Q1. Yes,thread can cancel itself.
Q2. If one thread cancel another thread , its resources are hang around until main thread
join that thread with pthread_join() function(if the thread is joinable). And if the canceled
thread is not join in main thread resources are free with program ends/terminate.
Q3. I am not sure, but main program don't know which thread was canceled.
thread can cancel any other thread (within the same process) including itself
threads do not have return values (in general way, they can have return status only), resources of the thread will be freed upon cancellation
main program can store thread's handler and test whether it valid or not
I have an application in which I have 1 main thread which creates 10 different threads for doing some job. At the end of application, when I try to exit, the application is not able to cleanly exit. The stack trace is not that useful, but its showing the crash in function "cancel_deliver()" My first guess is this is some underlying call made while doing the freeing up of resources used by each thread, but not entirely sure.
fyi: The callback function for each thread has a while (1) loop:
Here is the snippet
void main (...)
{
pthread_t tid;
for (int i=0; i<10; i++)
pthread_create(&tid, NULL, xyzCallback, NULL);
}
void xyzCallback(void* data)
{
while (1)
{
////
}
}
void atExit()
{
exit(1);
}
Is there any thing that I can do to free up resrouces used by my thread and cleanly exit?
For this case
If I understand your setup correctly...
One thing you could do is have a set of 'flag' variables, one for each thread (including the main thread). When the main thread is ready to end, set its flag. This flag should be continually checked within the 10 other threads. Once it becomes set, change the flag variable for that specific thread and call pthread_exit. In the main exit method, only terminate once all the flag variables are set.
Assuming your program isn't crashing due to another reason, this should enable all the threads to finish in a controlled manner.
(or use pthread_join in the main exit function, since pthread_exit returns information used by pthread_join)
In general
Use pthread_exit instead of exit(1) to cleanly exit the thread.
From the LLNL POSIX Thread Programming page:
There is a definite problem if main() finishes before the threads it spawned if you don't call pthread_exit() explicitly. All of the threads it created will terminate because main() is done and no longer exists to support the threads.
Also see the pthread_exit man page.
You need to decide whether your threads (1) need to end in a well defined way (state), or if the latter does not matter (2) take care where and when they could be cancelled (which happens implicitly when the program ends)
Referring 1: A possible way could be to implement an exit condition which will be triggered when the program shall terminate and makes the threads leave their while(1) loop. Before leaving the main loop calling pthread_join() for any pthread_t pthread it received by the calls to pthread_create(). If all threads terminate the main's join loops is left and main ends with all threads terminated already.
Referring 2: This is the more critical case, as depending on the thread functions's code it is not clear where it will be canceled, which could lead to unexpected behaviour. A thread could be cancelled at a so called cancellation point. Some system calls are treated as such.
In any case the thread function does not necessarily need to call pthread_exit() as last statement. This is only necessary if you want to have pthread_join() receive a pointer passed to pthread_exit().
What's the difference between pthread_exit() and exit()?
Did you read man pages?
exit() performs normal program termination, while pthread_exit() kills calling thread.
pthread_exit terminates a thread. Per the docs
Thread termination does not release any application visible process
resources, including, but not limited to, mutexes and file
descriptors, nor does it perform any process level cleanup actions,
including, but not limited to, calling any atexit() routines that may
exist.
exit, on the other hand, does do this.
the differences:
pthread_exit(): terminate a thread-whether its work is done or not
exit() perfoms normal program termination for the entire process.
Threads are created using pthread_create(). Each thread can then independently
terminate using pthread_exit(). (If any thread calls exit(), then all threads immediately terminate.) Unless a thread has been marked as detached (e.g., via a call to
pthread_detach()), it must be joined by another thread using pthread_join(), which
returns the termination status of the joined thread.
i am making a small project which will be incorporated into larger project. basically what it does is keeps track of threads that are created by way of adding them to a main struct which keeps track of what the thread does (its main function) and its pthread_t id. the other struct keeps track of the data to be passed to the function and the element number of where the pthread_t id is stored inside threads[]. its a bit micky mouse and it jumps around a bit but it all works besides when it is time to kill the thread. i get no segfaults and no errors and the program finishes fine, but the thread does not get killed when pthread_kill() is called (the function returns 0 meaning no error and it worked) although the thread continues to run until the main application returns.
pthread_kill() will not kill a thread. The only difference with kill() is that the signal is handled by the designated thread and not handled while that thread has the signal masked (see pthread_sigmask()). A signal like SIGTERM will by default still terminate the entire process.
If you are considering to call pthread_exit() from a signal handler, you should probably use pthread_cancel() instead.
Cancellation is safe if all code that may be cancelled cooperates (or the code that calls it disables cancellation for the time). Most libraries do not care about this, though.
A safer method is to ask the thread to exit without any force, such as by sending a special message to it (if the thread normally processes messages).
Alternatively, don't bother to kill any threads and just call _exit(), _Exit() or quick_exit().
From http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_kill.html
As in kill(), if sig is zero, error checking is performed but no signal is actually sent.
so the following
pthread_kill(threads[i].tID, 0);
Wont actually kill the thread. You need to use an actual signal to kill a thread. A list of signals can be found here:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html