I want to write a code to switch between threads every 10 microseconds.
But the problem is in the yield function. I get an interrupt while running the timer handler. So it doesn't finish properly.
This is the code I have for initializing the timer:
signal(SIGALRM, &time_handler);
struct itimerval t1;
t1.it_interval.tv_sec = INTERVAL_SEC;
t1.it_interval.tv_usec = INTERVAL_USEC;
t1.it_value.tv_sec = INTERVAL_SEC;
t1.it_value.tv_usec = INTERVAL_USEC;
setitimer(ITIMER_REAL, &t1, NULL);
And this is the code for the handler function:
void time_handler(int signo)
{
write(STDOUT_FILENO, "interrupt\n", sizeof("interrupt\n"));
green_yield();
}
And this is what I do in the yield function: a queue from which we get the thread to run next. The problem is at any moment before I swap context between threads, I can get an interrupt. Especially because I swap the context at the end of this function.
int green_yield(){
green_t *susp = running ;
// add susp to ready queue
// ===========================
enQueue(ready_queue, susp);
// ===========================
// select the next thread for execution
// ===========================
green_t * next = deQueue(ready_queue);
running = next;
// ===========================
// save current state into susp->context and switch to next->context
// ===========================
swapcontext(susp->context, next->context);
return 0;}
What can I do to make sure that I first complete the yield function and then get the interrupt?
Foreword: Depending on your system hardware, a write() system call into stdout may take longer than 10 us. So, calling this from the SIGALRM handler with a cyclic timer of 10 us may be wrong.
In GLIBC, signal(SIGALRM, time_handler) is equivalent to sigaction() with SA_RESTART flag. SIGALRM signal is blocked during the execution of the handler. So, you will not receive a signal while running the handler. It is implicitly blocked during handler execution and unblocked after it finishes. Since the latter calls green_yield(), you will not get a signal while running inside green_yield().
As getcontext() saves the signal mask with SIGALRM unblocked (as I guess you call it at the beginning of your program when you create the threads), when you swap the context to go from one interrupted threads running the signal handler to the next schedulable thread, the newly running thread:
At 1st scheduling time, returns from its getcontext() (the thread creation point). This restores the signal mask even if the previous thread did not return from the signal handler because the context contains a signal mask with SIGALRM unblocked. When the timer elapses again, SIGALRM will come again to interrupt the newly running thread which will yield the CPU in the signal handler calling swapcontext(). This time the saved context contains a signal mask with a blocked SIGALRM;
At subsequent scheduling time, returns from swapcontext() as it was interrupted by the signal and so was running the end of the signal handler. The context restores a blocked SIGALRM signal but this will be unblocked as part of the execution of the signal handler since its execution restarts from the end of the signal handler.
Even if the preceding is supposed to work, note that when a signal is raised, the system creates a stack frame on the top of the current process stack to make the signal handler appear as a function called by the user program and returning at the interruption point. This frame on the stack must not be corrupted by threads running from any point on the global process stack. The use of sigaltstack() may be considered (see notes below).
What about your thread implementation? They all share the same stack (the process stack). When you create them, they all save their context with getcontext() nearly at the same point in the global process stack. So, when you switch from one thread to another, the newly running thread may screw up the stack frames of the previously running threads... I think this is the point on which you should focus: arrange your threads to make them run with their own global stack zone or with their own stack using something like makecontext(). The manual of the latter provides an example to create several threads of execution with separate stacks.
Side note:
swapcontext() is not part of the allowed function calls in the signal handlers: cf. man 7 signal-safety. So, it is not safe to call it from there. But at the same time, we can see that non-local gotos (i.e. longjmp()) can safely be called from the signal handler. Since swapcontext() looks like a non local goto, it may be safe to call it under the same conditions as longjmp()...
The manual of sigaltstack() provides some tips to use swapcontext() from signal handlers
Related
Let's say we have a program in C that uses the sleep() function
The program executes and goes to sleep. Then we type Ctrl+C to send a SIGINT signal to the process.
We know that the default action upon receipt of a SIGINT is to terminate the process, we also know that the sleep() function resume the process whenever the sleeping process receives a signal.
And my textbook says in order to allow sleep() function to return, we must install a SIGINT handler like this:
void handler(int sig){
return; /* Catch the signal and return */
}
...
int main(int argc, char **argv) {
...
if (signal(SIGINT, handler) == SIG_ERR) /* Install SIGINT handler */
unix_error("signal error\n");
...
sleep(1000)
}
Althouth the code seems to be straightforward, I still have questions if I want to dig deeper:
Background: When the process is sleeping and we type Ctrl+C to send SIGINT
Q1-My understanding is, Kernel sends SIGINT to the process by updating the SIGINT's corresponging pending bit in the pend bit vector, is my understanding correct?
Q2-The processor detects the existance of SIGINT, but since we overwrite the handler to make it return in stead of terminating the process, so our handler get executed, and then Kernel clears SIGINT's corresponging pending bit, is my understanding correct?
Q3- Since SIGINT's corresponging pending bit is cleared, then how can sleep() function gets return? I think it should be in sleep still because in theory, sleep() function has no way of knowing the existance of SIGINT(has been cleared)
Q1: the kernel checks if the process has blocked the received signal, if so, it updates the pending signal bit (unreliable, on systems with relable signals, this should be a counter) in the process entry, for the signal handler to be called when signals are unblocked again (see below). If not blocked, the system call prepares the return value and errno value and returns to user mode with a special code installed in the program's virtual stack that makes it to call the signal handler (already in user mode) before returning from the generic syscall code. The return from the system call gives -1 to the caller code, and the errno variable is set to EINTR. This requires the process to have installed a signal handler, because by default the action is to abort the process, so it will not return from the system call it is waiting on. Think that when one says the kernel the actual code executed is in the system call being awaken and notified of the special condition (a signal received) The interrupted call, detects that a signal handler is to be called, and prepares the user stack to jump to the proper place (the interrupt handler in user code) before returning from the syscall() wrapper.
Q2: pending bit is only used to save that a pending signal handler is to be called, so this is not the case. In the execution part of the process, the unix program loader installs some basic code to jump to the signal handler before returning from the system call. This is because the signal handler has to execute in user mode (not in kernel mode) so everything happens upon termination of system call. The signal handler executed is the SIGINT, but the code interrupted is a system call, and nothing happens until the system call returns (with the return code and the errno variable already fixed)
Q3: well, your reasoning was based on a wrong premise, that is, the interrupt pending flag is indicating that an interrupt has been received. This bit only signals that an unprocessed interrupt has been marked for delivery as soon as you unblock it, and this only happens in another system call (to unblock a signal). As soon as the signal is unblocked, the return code of the sigsetmask(2) syscall will execute the signal handler. In this case, the signal will be delivered to the process as soon as the timer elapses, the system call will be interrupted and, if you have not installed a signal handler for the SIGALRM signal (but sleep(2) implementation does this ---at least, old implementations did) the program will be aborted.
NOTE
When I say that the program is aborted by the kernel but in both cases, the signals involved (SIGINT and SIGALRM) don't make it to dump a core file. The program is aborted without generating core. This is different to the behaviour of the abort() routine, which sends a SIGABRT and so, it makes de kernel to dump a core file of the process.
Q3- Since SIGINT's corresponging pending bit is cleared, then how can sleep() function gets return?
Imagine the sleep() function in the kernel as a function that:
allocates and sets fields in some kind of "timer event" structure
adds the "timer event" to a list of timer events for the timer's IRQ handler to worry about later (when the expiry time has elapsed)
moves the task from the "RUNNING" state to the "SLEEPING" state (so the scheduler knows not to give the task CPU time), causing scheduler to do a task switch to some other task
configures return parameters for user-space (the amount of time remaining or 0 if the time expired)
figures out why the scheduler gave it CPU time again (did the time expire or was the sleep interrupted by a signal?)
potentially mangles the stack a bit (so that the kernel returns to the signal handler if the sleep() was interrupted by a signal instead of returning to the code that called sleep())
returns to user-space
Also imagine that there's a second function (that I'm going to call wake() for no particular reason) that:
removes the "timer event" from the list of timer events (for the timer's IRQ handler to worry)
moves the task from the "SLEEPING" state to the "READY TO RUN" state (so the scheduler knows that the task can be given CPU time again)
Naturally, if the timer's IRQ handler notices that the "timer event" has expired then the timer's IRQ handler would call the wake() function to wake the task up again.
Now imagine there's a third function (that I'm going to call send_signal()) which might be called by other functions (e.g. called by kill()). This function might set a "pending signal" flag for the task that's supposed to receive the signal, then check what state the receiving task is in; and if the receiving task is in the "SLEEPING" state it calls the wake() function to wake it up (and then lets the latter part of the sleep() function worry about delivering the signal back to user-space whenever the scheduler feels like giving the task CPU time later).
Your understanding is correct.
Think about it. The process is blocked in the kernel. We need to return to user space to run the handler. How can we do that without interrupting whatever blocking kernel call was running? We only have one process/thread context to work with here. The process can't be both sleeping and running a signal handler.
The sequence is:
Process blocks in some blocking kernel call.
Signal is sent to it.
Bit is set, process is made ready-to-run.
Process resumes running in kernel mode, checks for pending non-blocked signals.
Signal dispatcher is invoked.
Process context is modified to execute signal handler upon resumption.
Process is resumed in user space
Signal handler runs.
Signal handler returns.
Kernel is invoked by end of signal handler.
Kernel makes decision whether to resume system call or return interruption error.
Can someone explain why we should not call non async functions from signal handlers ? Like the exact sequence of steps that corrupt the programs while calling with such functions.
And, does signals always run on separate stack ? if so is it a separate context or it runs on the context of the signaled thread ?
Finally, in case of a multi-threaded system what happens when signal handler is executed and some other thread is signaled and calls the same signal handler ?
(I am trying to develop deep understanding of signals and its applications)
When a process receives a signal, it is handled in the context of the process. You should only use aync-safe functions or re-entrant functions from inside a signal handler. For instance, you cannot call a malloc() or a printf() within a signal handler. The reason being:
*) Lets assume your process was executing in malloc when you received the signal. So the global heap data structures are in an inconsistent state. Now if you acquire the heap lock from inside your signal handler and make changes you will further render the heap inconsistent.
*) Another possibility is if the heap lock has been acquired by your process when it received the signal, and then you call malloc() from your signal handler, it sees that lock is held and it waits infinitely to acquire the lock (infinitely because the thread that can release the lock will not run till the signal is completely handled).
2) Signals run in the context of the process. As for the signal stack you can look at this SO answer -> Do signal handers have a separate stack?
3) As for getting multiple instances of the same signal you can look at this link -> Signal Handling in UNIX where Rumple Stiltskin answers it well.
I know some Solaris. So I'm using that for details. LWP==Solaris for "thread" as in pthreads.
trap signals like SIGILL, are delivered to the thread that caused the trap. Asynchronous signals are delivered to the first active thread (LWP), or process that is not blocking that signal. A kernel module called aslwp() traverses the process-header table (has associated LWP's) looking for the first likely candidate to receive the asynch signal.
A signal stack lives in the kernel. I'm not sure what/how to answer your signal stack question.
One process may have several pending signals. Is that what you mean?
Each signal destined for a process is held there until the process switches context (or is forced) into the active state. This in part because you generally cannot incur a trap when the process context has been swapped out and the process does nothing cpu-wise. You certainly can incur asynch signals. But the process cannot "do anything" with any signal if it cannot run. So, at this point the kernel swaps the context back to active, and the signal is delivered via aslwp().
Realtime signals behave differently, and I'm letting it stay with that.
Try reading this:
developers.sun.com/solaris/articles/signalprimer.html
I'm trying to write a signal handler to catch any number of consecutive SIGINT signals and prevent the program from exiting. The program is a simple file server. The handler sets a global flag which causes the while loop accepting new connections to end, a call to pthread_exit() ensures that main lets current connections finish before exiting. It all goes like clockwork when I hit ctrl-C once but a second time exits the program immediately.
I tried first with signal():
signal(SIGINT, catch_sigint);
...
static void catch_sigint(int signo)
{
...
signal(SIGINT, catch_sigint);
}
I also tried it using sigaction:
struct sigaction sigint_handler;
sigint_handler.sa_handler = catch_sigint;
sigemptyset(&sigint_handler.sa_mask);
sigint_handler.sa_flags = 0;
sigaction(SIGINT, &sigint_handler, NULL);
Unsure how to "reinstall" this one I just duplicated this code in the handler similar to the handler using the signal() method.
Neither one of these works as I expected.
Additional info:
The program is a simple file server. It receives a request from the client which is simply a string consisting of the requested file name. It utilizes pthreads so that transfers can occur simultaneously. Upon receiving SIGINT I wish for the server to exit the while loop and wait for all current transfers to complete then close. As is, no matter how I code the signal handler a second SIGINT terminates the program immediately.
int serverStop = 0;
...
int main()
{
/* set up the server -- socket(), bind() etc. */
struct sigaction sigint_hadler;
sigint_handler.sa_handler = catch_sigint;
sigint_handler.sa_flags = 0;
sigemptyset(&sigint_handler.sa_mask);
sigaction(SIGINT, &sigint_handler, NULL);
/* signal(SIGINT, catch_sigint); */
while(serverStop == 0)
{
/* accept new connections and pthread_create() for each */
}
pthread_exit(NULL);
}
...
static void catch_sigint(int signo)
{
serverStop = 1;
/* signal(SIGINT, catch_sigint) */
}
I don't think any other code could be pertinent but feel free to ask for elaboration
On Linux, you should not have to reinstall the signal handler, using either signal (which implements BSD semantics by default) or sigaction.
when I hit ctrl-C once but a second time exits the program immediately.
That's not because your handler got reset, but likely because your signal handler is doing something it shouldn't.
Here is how I would debug this issue: run the program under GDB and
(gdb) catch syscall exit
(gdb) catch syscall exit_group
(gdb) run
Now wait a bit for the program to start working, and hit Control-C. That will give you (gdb) prompt. Now continue the program as if it has received SIGINT: signal SIGINT (this will invoke your handler). Repeat the 'Control-C/signal SIGINT' sequence again. If you get stopped in either exit or exit_group system call, see where that is coming from (using GDB where command).
Update:
Given the new code you posted, it's not clear exactly where you call pthread_exit to "ensures that main lets current connections finish before exiting". As written, your main thread will exit the loop on first Control-C, and proceed to call exit which would not wait for other threads to finish.
Either you didn't show your actual code, or the "second Control-C" is a red herring and your first Control-C takes you out already (without finishing work in other threads).
NOTE: this is largely guesswork.
I'm pretty sure that calling pthread_exit in the main thread is a bad idea. If the main thread has quit, then the OS may try to send subsequent signals to some other thread.
I recommend that instead of using pthread_exit in the main thread, you just pthread_join() all the other threads, then exit normally.
But it's also important to ensure that the other threads do not get the signals. Normally this is done with sigprocmask (or maybe more correctly pthread_sigmask, which is the same under Linux) to mask the signal out in the worker threads. This ensures that the signal is never delivered to them.
Note that to avoid race conditions, you should use pthread_sigmask in the main thread just before creating a child thread, then set the signal mask back again in the main thread afterwards. This ensures that there is no window, however small, during which a child thread can possibly get unwanted signals.
I'm not sure to understand. A signal handler should usually not re-install any signal handler (including itself), because the signal handler stays in function till another is installed. See also SA_NODEFER flag to sigaction to be able to catch the signal during its handling.
A signal handler should be short. See my answer to this question. It usually mostly sets a volatile sig_atomic_t variable.
What is not working? Don't do complex or long-lasting processing inside signal handlers.
Please show your code...
I would like to know exactly how the execution of asynchronous signal handlers works on Linux. First, I am unclear as to which thread executes the signal handler. Second, I would like to know the steps that are followed to make the thread execute the signal handler.
On the first matter, I have read two different, seemingly conflicting, explanations:
The Linux Kernel, by Andries Brouwer, ยง5.2 "Receiving signals" states:
When a signal arrives, the process is interrupted, the current registers are saved, and the signal handler is invoked. When the signal handler returns, the interrupted activity is continued.
The StackOverflow question "Dealing With Asynchronous Signals In Multi Threaded Program" leads me to think that Linux's behavior is like SCO Unix's:
When a signal is delivered to a process, if it is being caught, it will be handled by one, and only one, of the threads meeting either of the following conditions:
A thread blocked in a sigwait(2) system call whose argument does include the type of the caught signal.
A thread whose signal mask does not include the type of the caught signal.
Additional considerations:
A thread blocked in sigwait(2) is given preference over a thread not blocking the signal type.
If more than one thread meets these requirements (perhaps two threads are calling sigwait(2)), then one of them will be chosen. This choice is not predictable by application programs.
If no thread is eligible, the signal will remain ``pending'' at the process level until some thread becomes eligible.
Also, "The Linux Signals Handling Model" by Moshe Bar states "Asynchronous signals are delivered to the first thread found not blocking the signal.", which I interpret to mean that the signal is delivered to some thread having its sigmask not including the signal.
Which one is correct?
On the second matter, what happens to the stack and register contents for the selected thread? Suppose the thread-to-run-the-signal-handler T is in the middle of executing a do_stuff() function. Is thread T's stack used directly to execute the signal handler (i.e. the address of the signal trampoline is pushed onto T's stack and control flow goes to the signal handler)? Alternatively, is a separate stack used? How does it work?
These two explanations really aren't contradictory if you take into account the fact that Linux hackers tend to be confused about the difference between a thread and a process, mainly due to the historical mistake of trying to pretend threads could be implemented as processes that share memory. :-)
With that said, explanation #2 is much more detailed, complete, and correct.
As for the stack and register contents, each thread can register its own alternate signal-handling stack, and the process can choose on a per-signal basis which signals will be delivered on alternate signal-handling stacks. The interrupted context (registers, signal mask, etc.) will be saved in a ucontext_t structure on the (possibly alternate) stack for the thread, along with the trampoline return address. Signal handlers installed with the SA_SIGINFO flag are able to examine this ucontext_t structure if they like, but the only portable thing they can do with it is examine (and possibly modify) the saved signal mask. (I'm not sure if modifying it is sanctioned by the standard, but it's very useful because it allows the signal handler to atomically replace the interrupted code's signal mask upon return, for instance to leave the signal blocked so it can't happen again.)
Source #1 (Andries Brouwer) is correct for a single-threaded process. Source #2 (SCO Unix) is wrong for Linux, because Linux does not prefer threads in sigwait(2). Moshe Bar is correct about the first available thread.
Which thread gets the signal? Linux's manual pages are a good reference. A process uses clone(2) with CLONE_THREAD to create multiple threads. These threads belong to a "thread group" and share a single process ID. The manual for clone(2) says,
Signals may be sent to a thread group as a whole (i.e., a
TGID) using kill(2), or to a specific thread (i.e., TID) using
tgkill(2).
Signal dispositions and actions are process-wide: if an
unhandled signal is delivered to a thread, then it will affect
(terminate, stop, continue, be ignored in) all members of the
thread group.
Each thread has its own signal mask, as set by sigprocmask(2),
but signals can be pending either: for the whole process
(i.e., deliverable to any member of the thread group), when
sent with kill(2); or for an individual thread, when sent with
tgkill(2). A call to sigpending(2) returns a signal set that
is the union of the signals pending for the whole process and
the signals that are pending for the calling thread.
If kill(2) is used to send a signal to a thread group, and the
thread group has installed a handler for the signal, then the
handler will be invoked in exactly one, arbitrarily selected
member of the thread group that has not blocked the signal.
If multiple threads in a group are waiting to accept the same
signal using sigwaitinfo(2), the kernel will arbitrarily
select one of these threads to receive a signal sent using
kill(2).
Linux is not SCO Unix, because Linux might give the signal to any thread, even if some threads are waiting for a signal (with sigwaitinfo, sigtimedwait, or sigwait) and some threads are not. The manual for sigwaitinfo(2) warns,
In normal usage, the calling program blocks the signals in set via a
prior call to sigprocmask(2) (so that the default disposition for
these signals does not occur if they become pending between
successive calls to sigwaitinfo() or sigtimedwait()) and does not
establish handlers for these signals. In a multithreaded program,
the signal should be blocked in all threads, in order to prevent the
signal being treated according to its default disposition in a thread
other than the one calling sigwaitinfo() or sigtimedwait()).
The code to pick a thread for the signal lives in linux/kernel/signal.c (the link points to GitHub's mirror). See the functions wants_signal() and completes_signal(). The code picks the first available thread for the signal. An available thread is one that doesn't block the signal and has no other signals in its queue. The code happens to check the main thread first, then it checks the other threads in some order unknown to me. If no thread is available, then the signal is stuck until some thread unblocks the signal or empties its queue.
What happens when a thread gets the signal? If there is a signal handler, then the kernel causes the thread to call the handler. Most handlers run on the thread's stack. A handler can run on an alternate stack if the process uses sigaltstack(2) to provide the stack, and sigaction(2) with SA_ONSTACK to set the handler. The kernel pushes some things onto the chosen stack, and sets some of the thread's registers.
To run the handler, the thread must be running in userspace. If the thread is running in the kernel (perhaps for a system call or a page fault), then it does not run the handler until it goes to userspace. The kernel can interrupt some system calls, so the thread runs the handler now, without waiting for the system call to finish.
The signal handler is a C function, so the kernel obeys the architecture's convention for calling C functions. Each architecture, like arm, i386, powerpc, or sparc, has its own convention. For powerpc, to call handler(signum), the kernel sets the register r3 to signum. The kernel also sets the handler's return address to the signal trampoline. The return address goes on the stack or in a register by convention.
The kernel puts one signal trampoline in each process. This trampoline calls sigreturn(2) to restore the thread. In the kernel, sigreturn(2) reads some information (like saved registers) from the stack. The kernel had pushed this information on the stack before calling the handler. If there was an interrupted system call, the kernel might restart the call (only if the handler used SA_RESTART), or fail the call with EINTR, or return a short read or write.
I am writing a basic user level thread library. The function prototype for thread creation is
thr_create (start_func_pointer,arg)
{
make_context(context_1,start_func)
}
start_func will be user defined and can change depending on user/program
once after creation of thread, if I start executing it using
swapcontext(context_1,context_2)
the function start_func would start running. Now , if a signal comes in , I need to handle it. Unfortunately, I just have the handle to start_func so I cant really define signal action inside the start_func
is there a way I can add a signal handling structure inside the start_function and point it to my code. something like this
thr_create (start_func_pointer,arg)
{
start_func.add_signal_hanlding_Structure = my_signal_handler();
make_context(context_1,start_func)
}
Does anybody know how posix does it ?
If you are talking about catching real signals from the actual operating system you are running on I believe that you are going to have to do this application wide and then pass the signals on down into each thread (more on this later). The problem with this is that it gets complicated if two (or more) of your threads are trying to use alarm which uses SIGALRM -- when the real signal happens you can catch it, but then who do you deliver it to (one or all of the threads?).
If you are talking about sending and catching signals just among the threads within a program using your library then sending a signal to a thread would cause it to be marked ready to run, even if it were waiting on something else previously, and then any signal handling functionality would be called from your thread resume code. If I remember from your previous questions you had a function called thread_yield which was called to allow the next thread to run. If this is the case then thread_yield needs to check a list of pending signals and preform their actions before returning to where ever thread_yield was called (unless one of the signal handlers involved killing the current thread, in which case you have to do something different).
As far as how to implement registering of signal handlers, in POSIX that is done by system calls made by the main function (either directly or indirectly). So you could have:
static int foo_flag = 0;
static void foo_handle(int sig) {
foo_flag = 1;
}
int start_func(void * arg) {
thread_sig_register(SIGFOO, foo_handle);
thread_pause();
// this is a function that you could write that would cause the current thread
// to mark itself as not ready to run and then call thread_yield, so that
// thread_pause() will return only after something else (a signal) causes the
// thread to become ready to run again.
if (foo_flag) {
printf("I got SIGFOO\n");
} else {
printf("I don't know what woke me up\n");
}
return 0;
}
Now, from another thread you can send this thread a SIGFOO (which is just a signal I made up for demonstration purposes).
Each of your thread control blocks (or whatever you are calling them) will have to have a signal handler table (or list, or something) and a pending signal list or a way to mark the signals as pending. The pending signals will be examined (possibly in some priority based order) and the handler action is done for each pending signal before returning to that threads normal code.