I'm debugging a sort of weird issue where it looks like a thread that's killed in an atexit handler is accessing a shared library, and is segfaulting because that shared library is unloaded before the exit handler runs. I'm not sure this is actually the issue, but it's a hunch on what might be happening.
When a process terminates (main exits or exit() is called), is the atexit handler the immediate next thing to run? My mind says so, but the segfault I'm seeing seems to say otherwise.
Is there any difference (with regards to exit handling) between main returning (either end of function or with return) and calling exit() directly?
When a process terminates (main exits or exit() is called), is the atexit handler the immediate next thing to run? My mind says so, but the segfault I'm seeing seems to say otherwise.
Not necessarily, you're guaranteed that atexit handlers will run. But that's it, atexit handlers may even be called concurrently with other things. While you're using c remember that c++ may be in the same process. C++ says that atexit may be called concurrently to destructors being run for objects of static duration. This means that atexit is very dangerous and you need to ensure you're being very careful what you call.
Is there any difference (with regards to exit handling) between main returning (either end of function or with return) and calling exit() directly?
According to the documentation: No.
The safest thing to do when tearing down the house: Nothing. Just get out and let the house be torn down. Closing the drapes on the way out isn't worth the time so to speak.
Related
There's a conflict between man pages which I don't understand.
man 7 pthreads says that:
POSIX.1-2001 and POSIX.1-2008 require that all functions
specified in the standard shall be thread-safe, except for the
following functions:
and exit() isn't in the thread-safety exception list.
However, man 3 exit says:
The exit() function uses a global variable that is not protected, so it is not thread-safe.
By googling it seems that exit() is actually thread unsafe. So what's wrong with my understanding of man pages? Why is exit() not listed as thread unsafe in man 7 pthreads?
Why is exit() not listed as thread unsafe in man 7 pthreads?
That is a question for the documentation authors. But I can say why it should never be considered thread safe (from a userspace/c language perspective). It is explicitly listed as: well it's complicated.
You can't legally call exit twice, so that takes care of the data race. Because even if you did it's already explictly undefined behavior to call exit or quick_exit anywhere in the application again after one of the two has been called.
Once you call exit you've basically said to the runtime "Hey I need to tear down the house" the house being your application. The runtime then runs atexit handlers per the standard.
Also worth noting is that atexit handlers are not guaranteed thread safe but atexit itself is, so if threads have registered their own atexit handlers those will get called.
So what does all this mean:
exit is not thread safe. But if you're tearing down the house does it really matter if the house was constructed to code anymore? Once you've called exit and the atexit handlers have finished then the kernel kills the threads, closes any handles, and reclaims everything. Worrying about closing the drapes when the bulldozer is coming through the wall is kind of pointless.
The C standard disallows exit from being called more than once. Section 7.22.4.4 regarding the exit function paragraph 2 states:
The exit function causes normal program termination to occur. No
functions registered by the at_quick_exit function are called.
If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the
behavior is undefined.
Additionally, the POSIX man page for exit, i.e. man 3p exit also states this:
If a function registered by a call to atexit() fails to return, the
remaining registered functions shall not be called and the rest of the
exit()
processing shall not be completed. If exit() is called more than once, the behavior is undefined.
So the function is not thread safe by definition.
Perhaps the answer is in https://man7.org/linux/man-pages/man3/pthread_exit.3.html
You call pthread_exit(). Not exit()
Eventually, when the last thread exits, exit is called for you.
exit is not thread-safe for these reasons:
it abruptly terminates all the threads. How can something which kills all the threads be "thread-safe"?
exit must not be called more than once; that is undefined behavior. Therefore the basic thread-safety question "can this function be called from two threads at around the same time" is not answerable in the affirmative.
exit can execute arbitrary application code: the atexit handlers. According to POSIX, calling the atexit handlers is the first thing that exit does, and that means this happens while threads are running. These handlers can be badly written such that they lack thread safety, which means that their parent function, exit, is rendered unsafe.
I know that atexit is used to register a function handler. Then when an exit in the code occur that function is called. But what if an exit occur inside the function handler?
I was expecting an infinite loop but in reality the program exit normally. Why?
void handler(){
printf("exit\n");
exit(1);
}
int maint(int argc, char *argv[]) {
atexit(handler);
exit(1);
}
The behavior is undefined.
7.22.4.4 The exit function
2 The exit function causes normal program termination to occur. No
functions registered by the at_quick_exit function are called. If a
program calls the exit function more than once, or calls the
quick_exit function in addition to the exit function, the behavior is
undefined.
Calling exit in an at_exit handler (that is being run during the normal processing of exit) is definitely a second call to exit.
Exiting normally is a possible behavior, but seeing as anything can happen (the nature of the behavior being undefined), it could very well result in catastrophe. Best not to do it.
As you have been pointed to, the behaviour is undefined.... but despite of that, trying to justify your observed behaviour, the library writers normally tend to cope with strange programmer behaviours (like the at least strange of calling exit() while the program is in exit()) I'll say:
It is possible that the exit(3) function, before calling any of the exit handlers, just unregisters it from the list of signal handlers. This would make the exit(2) function to call each exit handler only once and not to call the handler recursively. Just try to register it again to see what happens would be a good exercise.
It is possible that the exit function, marks itself as being run and if called inside a handler, just return, as if nothing happens.
It is possible your expected behaviour that could lead to a stak overflow (no pun here :))
It is possible to ...
Whatever happens is part of the U.B. commented in other answers, but for a library that tries to extend on the standard and behave normally, the probably best behaviour is to avoid recursive calls in exit handlers in some of the ways proposed.
On the other side, you had better not to use this feature (let's call so) in your programs, because, as it is not endorsed by the standard, can lead you to trouble if you port your programs elsewhere in the future.
You probably think on exit(3) as a function that is never to be called twice (apart from recursively, like you expose) but think you have several threads on your program and two of them call the exit(3) function at the same time.
The probable best behaviour is to have some kind of semaphore that allows the handlers to be protected from mutual access... but the best way to have the list of handlers for short time compromised, is to unlink one handler from the list (let's consider the list a queue, where each thread comes and takes a handler) they get the handler, unlock the queue and then execute it. This can lead in each handler being executed by one thread of the ones that have called exit(). The first thing an implementor faces it how to deal with several threads calling exit() at the same time.
POSIX.1 says that the result of calling exit(3) more than once (i.e., calling exit(3) within a function registered using atexit()) is undefined. On some systems (but not Linux), this can result in an infinite recursion; portable programs should not invoke exit(3) inside a function registered using atexit().
found here
I have read that TerminateThread() in WinAPI is dangerous to use.
Is pthread_kill() in Linux also dangerous to use?
Edit: Sorry I meant pthread_kill() and not pthread_exit().
To quote Sir Humphrey Appleby, the answer is "yes, and no".
In and of itself calling pthread_exit() is not dangerous and is called implicitly when your thread exits its method. However, there are a few "gotchas" if you call it manually.
All cleanup handlers are called when this is called. So if you call this method, then access some memory that the cleanup handlers have cleaned up, you get a memory error.
Similarly, after this is called, any local and thread-local variables for the thread become invalid. So if a reference is made to them you can get a memory error.
If this has already been called for the thread (implicitly or explicitly) calling it again has an undefined behaviour, and
If this is the last thread in your process, this will cause the process to exit.
If you are careful of the above (i.e. if you are careful to not reference anything about the thread after you have called pthread_exit) then it is safe to call call manually. However, if you are using C++ instead of C I would highly recommend using the std::thread class rather than doing it manually. It is easier to read, involves less code, and ensures that you are not breaking any of the above.
For more information type "man pthread_exit" which will essentially tell you the above.
Since the question has now been changed, I will write a new answer. My answer still remains "yes and no" but the reasons have changed.
pthread_kill is somewhat dangerous in that it shares the potential timing risks that is inherent in all signal handling systems. In addition there are complexities in dealing with it, specifically you have to setup a signal handler within the thread. However one could argue that it is less dangerous than the Windows function you mention. Here is why:
The Windows function essentially stops the thread, possibly bypassing the proper cleanup. It is intended as a last resort option. pthread_kill, on the other hand, does not terminate the thread at all. It simply sends a signal to the thread that the thread can respond to.
In order for this to do something you need to have registered in the thread what signals you want it to handle. If your goal is to use pthread_kill to terminate the thread, you can use this by having your signal handler set a flag that the thread can access, and having the thread check the flag and exit when it gets set. You may be able to call pthread_exit from the signal handler (I've never tried that) but it strikes me as being a bad idea since the signal comes asynchronously, and your thread is not guaranteed to still be running. The flag option I mention solves this provided that the flag is not local to the thread, allowing the signal handler to set it even if the target thread has already exited. Of course if you are doing this, you don't really need pthread_kill at all, as you can simply have your main thread set the flag at the appropriate time.
There is another option for stopping another thread - the pthread_cancel method. This method will place a cancel request on the target thread and, if the thread has been configured to be cancellable (you generally do this in the pthread_create, but you can also do it after the fact), then the next time the thread reaches a potential cancellation point (specified by pthread_testcancel but also automatically handled by many system routines such as the IO calls), it will exit. This is also safer than what Windows does as it is not violently stopping the thread - it only stops at well defined points. But it is more work than the Windows version as you have to configure the thread properly.
The Wikipedia page for "posix threads" describes some of this (but not much), but it has a pretty good "See also" and "References" section that will give you more details.
When our UNIX/C program needs an emergency exit, we use exit (3) function and install atexit (3) handlers for emergency clean-ups. This approach worked fine until our application got threaded, at which point atexit() handlers stopped to work predictably.
We learned by trial an error that threads may already be dead in atexit() handler, and their stacks deallocated.
I failed to find a quote in the standard linking thread disappearance with atexit(): threads cease to exist after return from main(), but is it before invocation of atexit() or after? What's the actual practice on Linux, FreeBSD and Mac?
Is there a good pattern for emergency cleanup in a multi-threaded program?
Posix Standard
It doesn't seem to be defined by Posix whether atexit handlers are called before or after threads are terminated by exit.
There are two (or three) ways for a process to terminate "normally".
All threads terminate. When the last thread exits, either by returning or calling pthread_exit, atexit handlers are run. In this case there are no other threads. (This is platform dependent. Some platforms may terminate other threads if the main thread terminates other than by exit, others do not).
One thread calls exit. In this case, atexit handlers will be run and all threads terminated. Posix doesn't specify in what order.
main returns. This is more-or-less equivalent to calling exit() as the last line of main, so can be treated as above.
OS Practice
In Linux, the documentation https://linux.die.net/man/2/exit
says threads are terminated by _exit calling exit_group, and that _exit is called after atexit handlers. Therefore in Linux on calling exit any atexit handlers are run before threads are terminated. Note that they are run on the thread calling exit, not the thread that called atexit.
On Windows the behaviour is the same, if you care.
Patterns for emergency cleanup.
The best pattern is: Never be in a state which requires emergency cleanup.
There is no guarantee that your cleanup will run because
you could have a kill -9 or
a power outage.
Therefore you need to be able to recover in that scenario.
If you can recover from a that, you can also recover from abort, so you can use abort for your emergency exit.
If you can't do that, or if you have "nice-to-have" cleanup you want to do, atexit handlers should be fine provided you first gracefully stop all threads in the process to prevent entering an inconsistent state while doing cleanup.
I want to set up a signal handler for SIGSEGV, SIGILL and possibly a few other signals that, rather than terminating the whole process, just terminates the offending thread and perhaps sets a flag somewhere so that a monitoring thread can complain and start another thread. I'm not sure there is a safe way to do this. Pthreads seems to provide functions for exiting the current thread, as well as canceling another thread, but these potentially call a bunch of at-exit handlers. Even if they don't, it seems as though there are many situations in which they are not async-signal-safe, although it is possible that those situations are avoidable. Is there a lower-level function I can call that just destroys the thread? Assuming I modify my own data structures in an async-signal-safe way, and acquire no mutexes, are there pthread/other global data structures that could be left in an inconsistent state simply by a thread terminating at a SIGSEGV? malloc comes to mind, but malloc itself shouldn't SIGSEGV/SIGILL unless the libc is buggy. I realize that POSIX is very conservative here, and makes no guarantees. As long as there's a way to do this in practice I'm happy. Forking is not an option, btw.
If the SIGSEGV/SIGILL/etc. happens in your own code, the signal handler will not run in an async-signal context (it's fundamentally a synchronous signal, but would still be an AS context if it happened inside a standard library function), so you can legally call pthread_exit from the signal handler. However, there are still issues that make this practice dubious:
SIGSEGV/SIGILL/etc. never occur in a program whose behavior is defined unless you generate them via raise, kill, pthread_kill, sigqueue, etc. (and in some of these special cases, they would be asynchronous signals). Otherwise, they're indicative of a program having undefined behavior. If the program has invoked undefined behavior, all bets are off. UB is not isolated to a particular thread or a particular sequence in time. If the program has UB, its entire output/behavior is meaningless.
If the program's state is corrupted (e.g. due to access-after-free, use of invalid pointers, buffer overflows, ...) it's very possible that the first faulting access will happen inside part of the standard library (e.g. inside malloc) rather than in your code. In this case, the signal handler runs in an AS-safe context and cannot call pthread_exit. Of course the program already has UB anyway (see the above point), but even if you wanted to pretend that's not an issue, you'd still be in trouble.
If your program is experiencing these kinds of crashes, you need to find the cause and fix it, not try to patch around it with signal handlers. Valgrind is your friend. If that's not possible, your best bet is to isolate the crashing code into separate processes where you can reason about what happens if they crash asynchronously, rather than having the crashing code in the same process (where any further reasoning about the code's behavior is invalid once you know it crashes).