Determining which signal interrupted my system call (Linux) - c

After my system call has returned because it was interrupted by a signal, is there a way to determine exactly which signal type (i.e. child process termination) caused the interruption?

There's a number of facilities in Linux to deal with signals:
waitpid(2) could be used to wait inline for SIGCHLD
sigaction(2) could be used to setup handler functions to react to specific signals, the SA_RESTART flag here affects whether certain system calls are interrupted or restarted
sigprocmask(2) could be used to block a number of signals
sigwait(3) could be used to wait for number of signals inline
Latest kernels support signalfd(2), which is convenient when one needs to combine signal handling and non-blocking IO.
Then there's the whole next level of complexity when we start talking about threads, though if you deal with signals explicitly you usually don't really care which signal interrupted the system call.

You need to set an handler. Have a look here.

Related

Does multi-thread program handle multiple signals in parallel?

Platform is Linux/POSIX.
The signal is sent to a whole process, not a specific thread.
No signal is set to blocked, all default.
The process is multi-thread process.
From what I've googled, a signal may be handled by a random thread.
And when that signal's handler is executing, it's temporarily blocked until handler returns.
QUESTION: Multiple signals of different types reached simultaneously. Do their handler execute simultaneously on multiple thread or all of them go to one randomly picked thread (SUB-QUESTION: in this case a handler could interrupt another handler's execution started previously, so there could be a interrupt stack?) ? Or mixed? For instance there are 3 type of signals received but only 2 thread free (this is actually the first case).
EXAMPLE: SIGHUP, SIGINT, SIGTERM reached almost simultaneously. The program has two available thread to dispatch signal handler execution.
SIDE-QUESTION: If signal handlers run in parallel, I'll have to use mutex to synchronize them properly. Otherwise 'volatile sig_atomic_t' would be enough, right?
Expected: all signals go to one thread (randomly picked) despite of their different signal types, I haven't seen an example of using mutexes and atoms to synchronize signal handlers.
Your understanding is correct - unless a signal was directed
to a specific thread, there's no guarantee which thread will handle a signal.
See POSIX's Signal Generation and Delivery and pthreads(7):
POSIX.1 distinguishes the notions of signals that are directed
to the process as a whole and signals that are directed to
individual threads. According to POSIX.1, a process-directed
signal (sent using kill(2), for example) should be handled by
a single, arbitrarily selected thread within the process.
So it may be delivered & handled by the same thread that's currently handling another signal (in that case, the previous handler may be interrupted by the new signal). Or may be delivered to another signal.
You can block other signals while one is being handled using sa_mask field
of sigaction to avoid a signal handler being interrupted.
SIDE-QUESTION: If signal handlers run in parallel, I'll have to use mutex to synchronize them properly. Otherwise 'volatile sig_atomic_t' would be enough, right?
You almost certainly don't want to use mutex in a signal handler. There are only few functions that can be safely called from a signal handler (you can only call the functions that are async-signal-safe).
See signal-safty for more information.
If you can use volatile sig_atomic_t for whatever the purpose (do you need to co-ordinate execution of different signal handlers?), it should be preferred.
Expected: all signals go to one thread (randomly picked) despite of their different signal types, I haven't seen an example of using mutexes and atoms to synchronize signal handlers.
This is commonly done by blocking signals that you're interested in from main and fetching/handling them in a specific thread. See pthread_sigmask which also has an example on how to implement this.

Are all threads halted when one of them receives a signal and none of them block it?

I'm running a multithreaded application written in C on Linux.
To stop execution I send SIGINT and from the signal handler call a number of cleanup routines and, finally, call exit(0).
Are the other threads still running or may run (context switch) while the handler executes the cleanup routines?
Handling a signal does not cause the suspension of other threads during execution of the signal handler. Moreover, it's generally not safe to call most functions you would need for cleanup (including even exit!) from a signal handler unless you can ensure that it does not interrupt an async-signal-unsafe function.
What you should do is simply store the fact that SIGINT was received in some async-signal-safe manner and have the program act on that condition as part of its normal flow of execution, outside the signal handler. Then you can properly synchronize with other threads (using mutexes, condition variables, etc.) to achieve a proper, safe shutdown. The ideal method is not to even install a signal handler, but instead block all signals and have a dedicated signal-handling thread calling sigwaitinfo in a loop to accept signals.
Yes, a signal is delivered to one thread, chosen in an unspecified way. Only threads that aren't blocking the signal are considered, though; if all threads block the signal, it remains queued up until one thread unblocks it.
(So if you make all threads block the signal, you can use the signal as a deterministic, inter-process synchronization mechanism, e.g. using sigwait.)

What is best practice for signal handling in production multi-threaded program on Linux?

I'm writing a multi-threaded program that shall run on a Linux system. I want to be sure that, if the program was in a reliable running condition (i.e. no segmentation faults, no abort, etc...), on exit it finalises a file writing some trailing information. To do so I want to handle the termination signals in order to trigger a graceful shut down.
Since it is a multi-threaded program all the signals are masked for all the threads but the main, which call sigwait on a signal set filled only with the termination signals. So all other signals are handled by their default action.
Is this a good practice, or I should provide a custom action for every signal?
Is this a good practice,
Yes absolutely. Handling termination signals in a multi-threaded environment any other way(by not having a single thread responsible for them) is virtually impossible.
or I should provide a custom action for every signal?
No. You'd normally want to handle SIGINT, SIGTERM and SIGHUP. SIGKILL can't be handled, and I'd leave SIGQUIT alone so it could be used to core-dump the application.
Use signalfd()!
This lets you handle signals along with file descriptor readiness in a single select() call.

How are asynchronous signal handlers executed on Linux?

I would like to know exactly how the execution of asynchronous signal handlers works on Linux. First, I am unclear as to which thread executes the signal handler. Second, I would like to know the steps that are followed to make the thread execute the signal handler.
On the first matter, I have read two different, seemingly conflicting, explanations:
The Linux Kernel, by Andries Brouwer, ยง5.2 "Receiving signals" states:
When a signal arrives, the process is interrupted, the current registers are saved, and the signal handler is invoked. When the signal handler returns, the interrupted activity is continued.
The StackOverflow question "Dealing With Asynchronous Signals In Multi Threaded Program" leads me to think that Linux's behavior is like SCO Unix's:
When a signal is delivered to a process, if it is being caught, it will be handled by one, and only one, of the threads meeting either of the following conditions:
A thread blocked in a sigwait(2) system call whose argument does include the type of the caught signal.
A thread whose signal mask does not include the type of the caught signal.
Additional considerations:
A thread blocked in sigwait(2) is given preference over a thread not blocking the signal type.
If more than one thread meets these requirements (perhaps two threads are calling sigwait(2)), then one of them will be chosen. This choice is not predictable by application programs.
If no thread is eligible, the signal will remain ``pending'' at the process level until some thread becomes eligible.
Also, "The Linux Signals Handling Model" by Moshe Bar states "Asynchronous signals are delivered to the first thread found not blocking the signal.", which I interpret to mean that the signal is delivered to some thread having its sigmask not including the signal.
Which one is correct?
On the second matter, what happens to the stack and register contents for the selected thread? Suppose the thread-to-run-the-signal-handler T is in the middle of executing a do_stuff() function. Is thread T's stack used directly to execute the signal handler (i.e. the address of the signal trampoline is pushed onto T's stack and control flow goes to the signal handler)? Alternatively, is a separate stack used? How does it work?
These two explanations really aren't contradictory if you take into account the fact that Linux hackers tend to be confused about the difference between a thread and a process, mainly due to the historical mistake of trying to pretend threads could be implemented as processes that share memory. :-)
With that said, explanation #2 is much more detailed, complete, and correct.
As for the stack and register contents, each thread can register its own alternate signal-handling stack, and the process can choose on a per-signal basis which signals will be delivered on alternate signal-handling stacks. The interrupted context (registers, signal mask, etc.) will be saved in a ucontext_t structure on the (possibly alternate) stack for the thread, along with the trampoline return address. Signal handlers installed with the SA_SIGINFO flag are able to examine this ucontext_t structure if they like, but the only portable thing they can do with it is examine (and possibly modify) the saved signal mask. (I'm not sure if modifying it is sanctioned by the standard, but it's very useful because it allows the signal handler to atomically replace the interrupted code's signal mask upon return, for instance to leave the signal blocked so it can't happen again.)
Source #1 (Andries Brouwer) is correct for a single-threaded process. Source #2 (SCO Unix) is wrong for Linux, because Linux does not prefer threads in sigwait(2). Moshe Bar is correct about the first available thread.
Which thread gets the signal? Linux's manual pages are a good reference. A process uses clone(2) with CLONE_THREAD to create multiple threads. These threads belong to a "thread group" and share a single process ID. The manual for clone(2) says,
Signals may be sent to a thread group as a whole (i.e., a
TGID) using kill(2), or to a specific thread (i.e., TID) using
tgkill(2).
Signal dispositions and actions are process-wide: if an
unhandled signal is delivered to a thread, then it will affect
(terminate, stop, continue, be ignored in) all members of the
thread group.
Each thread has its own signal mask, as set by sigprocmask(2),
but signals can be pending either: for the whole process
(i.e., deliverable to any member of the thread group), when
sent with kill(2); or for an individual thread, when sent with
tgkill(2). A call to sigpending(2) returns a signal set that
is the union of the signals pending for the whole process and
the signals that are pending for the calling thread.
If kill(2) is used to send a signal to a thread group, and the
thread group has installed a handler for the signal, then the
handler will be invoked in exactly one, arbitrarily selected
member of the thread group that has not blocked the signal.
If multiple threads in a group are waiting to accept the same
signal using sigwaitinfo(2), the kernel will arbitrarily
select one of these threads to receive a signal sent using
kill(2).
Linux is not SCO Unix, because Linux might give the signal to any thread, even if some threads are waiting for a signal (with sigwaitinfo, sigtimedwait, or sigwait) and some threads are not. The manual for sigwaitinfo(2) warns,
In normal usage, the calling program blocks the signals in set via a
prior call to sigprocmask(2) (so that the default disposition for
these signals does not occur if they become pending between
successive calls to sigwaitinfo() or sigtimedwait()) and does not
establish handlers for these signals. In a multithreaded program,
the signal should be blocked in all threads, in order to prevent the
signal being treated according to its default disposition in a thread
other than the one calling sigwaitinfo() or sigtimedwait()).
The code to pick a thread for the signal lives in linux/kernel/signal.c (the link points to GitHub's mirror). See the functions wants_signal() and completes_signal(). The code picks the first available thread for the signal. An available thread is one that doesn't block the signal and has no other signals in its queue. The code happens to check the main thread first, then it checks the other threads in some order unknown to me. If no thread is available, then the signal is stuck until some thread unblocks the signal or empties its queue.
What happens when a thread gets the signal? If there is a signal handler, then the kernel causes the thread to call the handler. Most handlers run on the thread's stack. A handler can run on an alternate stack if the process uses sigaltstack(2) to provide the stack, and sigaction(2) with SA_ONSTACK to set the handler. The kernel pushes some things onto the chosen stack, and sets some of the thread's registers.
To run the handler, the thread must be running in userspace. If the thread is running in the kernel (perhaps for a system call or a page fault), then it does not run the handler until it goes to userspace. The kernel can interrupt some system calls, so the thread runs the handler now, without waiting for the system call to finish.
The signal handler is a C function, so the kernel obeys the architecture's convention for calling C functions. Each architecture, like arm, i386, powerpc, or sparc, has its own convention. For powerpc, to call handler(signum), the kernel sets the register r3 to signum. The kernel also sets the handler's return address to the signal trampoline. The return address goes on the stack or in a register by convention.
The kernel puts one signal trampoline in each process. This trampoline calls sigreturn(2) to restore the thread. In the kernel, sigreturn(2) reads some information (like saved registers) from the stack. The kernel had pushed this information on the stack before calling the handler. If there was an interrupted system call, the kernel might restart the call (only if the handler used SA_RESTART), or fail the call with EINTR, or return a short read or write.

POSIX threads and signals

I've been trying to understand the intricacies of how POSIX threads and POSIX signals interact. In particular, I'm interested in:
What's the best way to control which thread a signal is delivered to (assuming it isn't fatal in the first place)?
What is the best way to tell another thread (that might actually be busy) that the signal has arrived? (I already know that it's a bad idea to be using pthread condition variables from a signal handler.)
How can I safely handle passing the information that a signal has occurred to other threads? Does this need to happen in the signal handler? (I do not in general want to kill the other threads; I need a far subtler approach.)
For reference about why I want this, I'm researching how to convert the TclX package to support threads, or to split it up and at least make some useful parts support threads. Signals are one of those parts that is of particular interest.
What's the best way to control which thread
a signal is delivered to?
As #zoli2k indicated, explicitly nominating a single thread to handle all signals you want handled (or a set of threads each with specific signal responsibilities), is a good technique.
What is the best way to tell another thread (that might actually be busy)
that the signal has arrived?[...]
How can I safely handle passing the information that a signal has occurred
to other threads? Does this need to happen in the signal handler?
I won't say "best," but here's my recommendation:
Block all desired signals in main, so that all threads are inherit that signal mask. Then, fashion the special signal receiving thread as a signal-driven event loop, dispatching newly arrived signals as some other intra-thread communication.
The simplest way to do this is to have the thread accept signals in a loop using sigwaitinfo or sigtimedwait. The thread then converts the signals somehow, perhaps broadcasting a pthread_cond_t, waking up other threads with more I/O, enqueuing a command in an application-specific thread-safe queue, whatever.
Alternatively, the special thread could allow signals to be delivered to a signal handler, unmasking for delivery only when ready to handle signals. (Signal delivery via handlers tends to be more error-prone than signal acceptance via the sigwait family, however.) In this case, the receiver's signal handler performs some simple and async-signal-safe action: setting sig_atomic_t flags, calling sigaddset(&signals_i_have_seen_recently, latest_sig), write() a byte to a non-blocking self-pipe, etc. Then, back in its masked main loop, the thread communicates receipt of the signal to other threads as above.
(UPDATED #caf rightly points out that sigwait approaches are superior.)
According to the POSIX standard all threads should appear with the same PID on the system and using pthread_sigmask() you can define the signal blocking mask for every thread.
Since it is allowed to define only one signal handler per PID, I prefer to handle all signals in one thread and send pthread_cancel() if a running thread need to be cancelled. It is the preferred way against pthread_kill() since it allows to define cleanup functions for the threads.
On some older systems, because of the lack of proper kernel support, the running threads may have different PID from the parent thread's PID. See FAQ for signal handling with linuxThreads on Linux 2.4.
Where I'm at so far:
Signals come in different major classes, some of which should typically just kill the process anyway (SIGILL) and some of which never need anything doing (SIGIO; easier to just do async IO right anyway). Those two classes need no action.
Some signals don't need to be dealt with immediately; the likes of SIGWINCH can be queued up until it is convenient (just like an event from X11).
The tricky ones are the ones where you want to respond to them by interrupting what you're doing but without going to the extent of wiping out a thread. In particular, SIGINT in interactive mode ought to leave things responsive.
I've still got to sort through signal vs sigaction, pselect, sigwait, sigaltstack, and a whole bunch of other bits and pieces of POSIX (and non-POSIX) API.
IMHO, Unix V signals and posix threads do not mix well.
Unix V is 1970. POSIX is 1980 ;)
There are cancellation Points and if you allow signals and pthreads in one application, you will eventually end up writing Loops around each call, which can surprisingly return EINTR.
So what I did in the (few) cases where I had to program multithreaded on Linux or QNX was, to mask out all signals for all (but one) threads.
When a Unix V Signal arrives, the process Switches the stack (that was as much concurrency in Unix V as you could get within a process).
As the other posts here hint, it might be possible now, to tell the System, which posix thread shall be the victim of that stack switching.
Once, you managed to get your Signal handler thread working, the question remains, how to transform the signal information to something civilized, other threads can use. An infrastructure for inter-thread communications is required. One pattern, useful is the actor pattern, where each of your threads is a target for some in-process Messaging mechanism.
So, instead of canceling other threads or killing them (or other weird stuff), you should try to marshall the Signal from the Signal context to your Signal handler thread, then use your actor pattern communications mechanisms to send semantically useful messages to those actors, who need the signal related Information.

Resources