How to properly suspend multiple threads with posix signals? - c

In the context of an existing multi-threaded application I want to suspend a list of threads for a specific duration then resume their normal execution. I know some of you wil say that I should not do that but I know that and I don't have a choice.
I came up with the following code that sort of work but randomly failed. For each thread I want to suspend, I send a signal and wait for an ack via a semaphore. The signal handler when invoked, post the semaphore and sleep for the specified duration.
The problem is when the system is fully loaded, the call to sem_timedwait sometimes fails with ETIMEDOUT and I am left with an inconsistent logic with semaphore used for the ack: I don't know if the signal has been dropped or is just late.
// compiled with: gcc main.c -o test -pthread
#include <pthread.h>
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <semaphore.h>
#include <sys/types.h>
#include <sys/syscall.h>
#define NUMTHREADS 40
#define SUSPEND_SIG (SIGRTMIN+1)
#define SUSPEND_DURATION 80 // in ms
static sem_t sem;
void checkResults(const char *msg, int rc) {
if (rc == 0) {
//printf("%s success\n", msg);
} else if (rc == ESRCH) {
printf("%s failed with ESRCH\n", msg);
} else if (rc == EINVAL) {
printf("%s failed with EINVAL\n", msg);
} else {
printf("%s failed with unknown error: %d\n", msg, rc);
}
}
static void suspend_handler(int signo) {
sem_post(&sem);
usleep(SUSPEND_DURATION*1000);
}
void installSuspendHandler() {
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = suspend_handler;
int rc = sigaction(SUSPEND_SIG, &sa, NULL);
checkResults("sigaction SUSPEND", rc);
}
void *threadfunc(void *param) {
int tid = *((int *) param);
free(param);
printf("Thread %d entered\n", tid);
// this is an example workload, the real app is doing many things
while (1) {
int rc = sleep(30);
if (rc != 0 && errno == EINTR) {
//printf("Thread %d got a signal delivered to it\n", tid);
} else {
//printf("Thread %d did not get expected results! rc=%d, errno=%d\n", tid, rc, errno);
}
}
return NULL;
}
int main(int argc, char **argv) {
pthread_t threads[NUMTHREADS];
int i;
sem_init(&sem, 0, 0);
installSuspendHandler();
for(i=0; i<NUMTHREADS; ++i) {
int *arg = malloc(sizeof(*arg));
if ( arg == NULL ) {
fprintf(stderr, "Couldn't allocate memory for thread arg.\n");
exit(EXIT_FAILURE);
}
*arg = i;
int rc = pthread_create(&threads[i], NULL, threadfunc, arg);
checkResults("pthread_create()", rc);
}
sleep(3);
printf("Will start to send signals...\n");
while (1) {
printf("***********************************************\n");
for(i=0; i<NUMTHREADS; ++i) {
int rc = pthread_kill(threads[i], SUSPEND_SIG);
checkResults("pthread_kill()", rc);
printf("Waiting for Semaphore for thread %d ...\n", i);
// compute timeout abs timestamp for ack
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
const int TIMEOUT = SUSPEND_DURATION*1000*1000; // in nano-seconds
ts.tv_nsec += TIMEOUT; // timeout to receive ack from signal handler
// normalize timespec
ts.tv_sec += ts.tv_nsec / 1000000000;
ts.tv_nsec %= 1000000000;
rc = sem_timedwait(&sem, &ts); // try decrement semaphore
if (rc == -1 && errno == ETIMEDOUT) {
// timeout
// semaphore is out of sync
printf("Did not received signal handler sem_post before timeout of %d ms for thread %d", TIMEOUT/1000000, i);
abort();
}
checkResults("sem_timedwait", rc);
printf("Received Semaphore for thread %d.\n", i);
}
sleep(1);
}
for(i=0; i<NUMTHREADS; ++i) {
int rc = pthread_join(threads[i], NULL);
checkResults("pthread_join()\n", rc);
}
printf("Main completed\n");
return 0;
}
Questions?
Is it possible for a signal to be dropped and never delivered?
What causes the timeout on the semaphore at random time when the system is loaded?

usleep() is not among the async-signal-safe functions (though sleep() is, and there are other async-signal-safe functions by which you can produce a timed delay). A program that calls usleep() from a signal handler is therefore non-conforming. The specifications do not describe what may happen -- neither with such a call itself nor with the larger program execution in which it occurs. Your questions can be answered only for a conforming program; I do that below.
Is it possible for a signal to be dropped and never delivered?
It depends on what exactly you mean:
If a normal (not real-time) signal is delivered to a thread that already has that signal queued then no additional instance is queued.
A thread can die with signals still queued for it; those signals will not be handled.
A thread can change a given signal's disposition (to SIG_IGN, for example), though this is a per-process attribute, not a per-thread one.
A thread can block a signal indefinitely. A blocked signal is not dropped -- it remains queued for the thread and will eventually be received some time after it is unblocked, if that ever happens.
But no, having successfully queued a signal via the kill() or raise() function, that signal will not be randomly dropped.
What causes the timeout on the semaphore at random time when the system is loaded?
A thread can receive a signal only when it is actually running on a core. On a system with more runnable processes than cores, some runnable processes must be suspended, without a timeslice on any core, at any given time. On a heavily-loaded system, that's the norm. Signals are asynchronous, so you can send one to a thread that is currently waiting for a timeslice without the sender blocking. It is entirely possible, then, that the thread you have signaled does not get scheduled to run before the timeout expires. If it does run, it may have the signal blocked for one reason or another, and not get around to unblocking it before it uses up its timeslice.
Ultimately, you can use your semaphore-based approach to check whether the target thread handled the signal within any timeout of your choice, but you cannot predict in advance how long it will take for the thread to handle the signal, nor even whether it will do so in any finite amount of time (for example, it could die for one reason or another before doing so).

Related

pthread_sigmask() not work in multithreaded program

I'm a newbie in c development. Recently, I noticed a problem when I was learning multi-threaded development, when I set a signal in the main thread of Action and when I try to block the signal action set by the main thread in the child thread, I find that it does not work.
Here is a brief description of the code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
void *thread_start(void *_arg) {
sleep(2);
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGUSR2);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
printf("child-thread executed\n");
while (true) {
sleep(1);
}
return NULL;
}
void sig_handler(int _sig) {
printf("executed\n");
}
int main(int argc, char *argv[]) {
pthread_t t_id;
int s = pthread_create(&t_id, NULL, thread_start, NULL);
if (s != 0) {
char *msg = strerror(s);
printf("%s\n", msg);
}
printf("main-thread executed, create [%lu]\n", t_id);
signal(SIGUSR2, sig_handler);
while (true) {
sleep(1);
}
return EXIT_SUCCESS;
}
The signal mask is a per-thread property, a thread will inherit whatever the parent has at time of thread creation but, after that, it controls its own copy.
In other words, blocking a signal in a thread only affects the delivery of signals for that thread, not for any other.
In any case, even if it were shared (it's not), you would have a potential race condition since you start the child thread before setting up the signal in the main thread. Hence it would be indeterminate as to whether the order was "parent sets up signal, then child blocks" or vice versa. But, as stated, that's irrelevant due to the thread-specific nature of the signal mask.
If you want a thread to control the signal mask of another thread, you will need to use some form of inter-thread communication to let the other thread do it itself.
As I wrote in a comment, any USR1 signal sent to the process will be delivered using the main thread. It's output will not tell you exactly what happened, so it is not really a good way to test threads and signal masks. Additionally, it uses printf() in a signal handler, which may or may not work: printf() is not an async-signal safe function, so it must not be used in a signal handler.
Here is a better example:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
/* This function writes a message directly to standard error,
without using the stderr stream. This is async-signal safe.
Returns 0 if success, errno error code if an error occurs.
errno is kept unchanged. */
static int write_stderr(const char *msg)
{
const char *end = msg;
const int saved_errno = errno;
int retval = 0;
ssize_t n;
/* If msg is non-NULL, find the string-terminating '\0'. */
if (msg)
while (*end)
end++;
/* Write the message to standard error. */
while (msg < end) {
n = write(STDERR_FILENO, msg, (size_t)(end - msg));
if (n > 0) {
msg += n;
} else
if (n != 0) {
/* Bug, should not occur */
retval = EIO;
break;
} else
if (errno != EINTR) {
retval = errno;
break;
}
}
/* Paranoid check that exactly the message was written */
if (!retval)
if (msg != end)
retval = EIO;
errno = saved_errno;
return retval;
}
static volatile sig_atomic_t done = 0;
pthread_t main_thread;
pthread_t other_thread;
static void signal_handler(int signum)
{
const pthread_t id = pthread_self();
const char *thread = (id == main_thread) ? "Main thread" :
(id == other_thread) ? "Other thread" : "Unknown thread";
const char *event = (signum == SIGHUP) ? "HUP" :
(signum == SIGUSR1) ? "USR1" :
(signum == SIGINT) ? "INT" :
(signum == SIGTERM) ? "TERM" : "Unknown signal";
if (signum == SIGTERM || signum == SIGINT)
done = 1;
write_stderr(thread);
write_stderr(": ");
write_stderr(event);
write_stderr(".\n");
}
static int install_handler(int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&act.sa_mask);
act.sa_handler = signal_handler;
act.sa_flags = 0;
if (sigaction(signum, &act, NULL) == -1)
return -1;
return 0;
}
void *other(void *unused __attribute__((unused)))
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGHUP);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
while (!done)
sleep(1);
return NULL;
}
int main(void)
{
pthread_attr_t attrs;
sigset_t mask;
int result;
main_thread = pthread_self();
other_thread = pthread_self(); /* Just to initialize it to a sane value */
/* Install HUP, USR1, INT, and TERM signal handlers. */
if (install_handler(SIGHUP) ||
install_handler(SIGUSR1) ||
install_handler(SIGINT) ||
install_handler(SIGTERM)) {
fprintf(stderr, "Cannot install signal handlers: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
/* Create the other thread. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 2*PTHREAD_STACK_MIN);
result = pthread_create(&other_thread, &attrs, other, NULL);
pthread_attr_destroy(&attrs);
if (result) {
fprintf(stderr, "Cannot create a thread: %s.\n", strerror(result));
return EXIT_FAILURE;
}
/* This thread blocks SIGUSR1. */
sigemptyset(&mask);
sigaddset(&mask, SIGUSR1);
pthread_sigmask(SIG_BLOCK, &mask, NULL);
/* Ready to handle signals. */
printf("Send a HUP, USR1, or TERM signal to process %d.\n", (int)getpid());
fflush(stdout);
while (!done)
sleep(1);
pthread_join(other_thread, NULL);
return EXIT_SUCCESS;
}
Save it as e.g. example.c, and compile and run using
gcc -Wall -O2 example.c -pthread -o exprog
./exprog
It will block the USR1 signal in the main thread, and HUP and TERM in the other thread. It will also catch the INT signal (Ctrl+C), which is not blocked in either thread. When you send it the INT or TERM signal, the program will exit.
If you send the program the USR1 signal, you'll see that it will always be delivered using the other thread.
If you send the program a HUP signal, you'll see that it will always be delivered using the main thread.
If you send the program a TERM signal, it too will be delivered using the main thread, but it will also cause the program to exit (nicely).
If you send the program an INT signal, it will be delivered using one of the threads. It depends on several factors whether you'll always see it being delivered using the same thread or not, but at least in theory, it can be delivered using either thread. This signal too will cause the program to exit (nicely).

Why some threads don't receive pthread_cond_broadcast?

I have a threadpool of workers. Each worker executes this routine:
void* worker(void* args){
...
pthread_mutex_lock(&mtx);
while (queue == NULL && stop == 0){
pthread_cond_wait(&cond, &mtx);
}
el = pop(queue);
pthread_mutex_unlock(&mtx);
...
}
main thread:
int main(){
...
while (stop == 0){
...
pthread_mutex_lock(&mtx);
insert(queue, el);
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mtx);
...
}
...
}
Then I have a signal handler that executes this code when it receives a signal:
void exit_handler(){
stop = 1;
pthread_mutex_lock(&mtx);
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mtx);
}
I have omitted declarations and initialization, but the original code has them.
After a signal is received most of the time it's all ok, but sometimes it seems that some worker threads stay in the wait loop because they don't see that the variable stop is changed and/or they are not waken up by the broadcast.
So the threads never end.
What I am missing?
EDIT: stop=1 moved inside the critical section in exit_handler. The issue remains.
EDIT2: I was executing the program on a VM with Ubuntu. Since the code appears to be totally right I tried to change VM and OS (XUbuntu) and now it seems to work correctly. Still don't know why, anyone has an idea?
Some guessing here, but it's too long for a comment, so if this is wrong, I will delete. I think you may have a misconception about how pthread_cond_broadcast works (at least something I've been burned with in the past). From the man page:
The pthread_cond_broadcast() function shall unblock all threads
currently blocked on the specified condition variable cond.
Ok, that make sense, _broadcast awakens all threads currently blocked on cond. However, only one of the awakened threads will then be able to lock the mutex after they're all awoken. Also from the man page:
The thread(s) that are unblocked shall contend for the mutex according
to the scheduling policy (if applicable), and as if each had called
pthread_mutex_lock().
So this means that if 3 threads are blocked on cond and _broadcast is called, all 3 threads will wake up, but only 1 can grab the mutex. The other 2 will still be stuck in pthread_cond_wait, waiting on a signal. Because of this, they don't see stop set to 1, and exit_handler (I'm assuming a Ctrl+c software signal?) is done signaling, so the remaining threads that lost the _broadcast competition are stuck in limbo, waiting on a signal that will never come, and unable to read that the stop flag has been set.
I think there are 2 options to work-around/fix this:
Use pthread_cond_timedwait. Even without being signaled, this will return from waiting at the specified time interval, see that stop == 1, and then exit.
Add pthread_cond_signal or pthread_cond_broadcast at the end of your worker function. This way, right before a thread exits, it will signal the cond variable allowing any other waiting threads to grab the mutex and finish processing. There is no harm in signaling a conditional variable if no threads are waiting on it, so this should be fine even for the last thread.
EDIT: Here is an MCVE that proves (as far as I can tell) that my answer above is wrong, heh. As soon as I press Ctrl+c, the program exits "immediately", which says to me all the threads are quickly acquiring the mutex after the broadcast, seeing that stop is false, and exiting. Then main joins on the threads and it's process over.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <stdbool.h>
#include <signal.h>
#include <unistd.h>
#define NUM_THREADS 3
#define STACK_SIZE 10
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t c = PTHREAD_COND_INITIALIZER;
volatile bool stop = false;
int stack[STACK_SIZE] = { 0 };
int sp = 0; // stack pointer,, also doubles as the current stack size
void SigHandler(int sig)
{
if (sig == SIGINT)
{
stop = true;
}
else
{
printf("Received unexcepted signal %d\n", sig);
}
}
void* worker(void* param)
{
long tid = (long)(param);
while (stop == false)
{
// acquire the lock
pthread_mutex_lock(&m);
while (sp <= 0) // sp should never be < 0
{
// there is no data in the stack to consume, wait to get signaled
// this unlocks the mutex when it is called, and locks the
// mutex before it returns
pthread_cond_wait(&c, &m);
}
// when we get here we should be guaranteed sp >= 1
printf("thread %ld consuming stack[%d] = %d\n", tid, sp-1, stack[sp-1]);
sp--;
pthread_mutex_unlock(&m);
int sleepVal = rand() % 10;
printf("thread %ld sleeping for %d seconds...\n", tid, sleepVal);
sleep(sleepVal);
}
pthread_exit(NULL);
}
int main(void)
{
pthread_t threads[NUM_THREADS];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
srand(time(NULL));
for (long i=0; i<NUM_THREADS; i++)
{
int rc = pthread_create(&threads[i], &attr, worker, (void*)i);
if (rc != 0)
{
fprintf(stderr, "Failed to create thread %ld\n", i);
}
}
while (stop == false)
{
// produce data in bursts
int numValsToInsert = rand() % (STACK_SIZE - sp);
printf("main producing %d values\n", numValsToInsert);
// acquire the lock
pthread_mutex_lock(&m);
for (int i=0; i<numValsToInsert; i++)
{
// produce values for the stack
int val = rand() % 10000;
// I think this should already be guaranteed..?
if (sp+1 < STACK_SIZE)
{
printf("main pushing stack[%d] = %d\n", sp, val);
stack[sp++] = val;
// signal the workers that data is ready
//printf("main signaling threads...\n");
//pthread_cond_signal(&c);
}
else
{
printf("stack full!\n");
}
}
pthread_mutex_unlock(&m);
// signal the workers that data is ready
printf("main signaling threads...\n");
pthread_cond_broadcast(&c);
int sleepVal = 1;//rand() % 5;
printf("main sleeping for %d seconds...\n", sleepVal);
sleep(sleepVal);
}
for (long i=0; i<NUM_THREADS; i++)
{
pthread_join(threads[i], NULL);
}
return 0;
}

Wait for signal, then continue execution

I am trying to make a program that suspends its execution until a signal arrives. Then, after the signal arrives I just want my code to continue its execution from where it was. I don't want it to execute a function handler or whatsoever. Is there a simple way of doing this? I have been struggling for a week or so, reading here and there, and didn't manage to get a fully operative code.
In particular, I want the main program to create a thread that waits for some particular event to happen (e.g., a user has input some data to stdin). Meanwhile, the main program is doing something but at some point it suspends its execution until it receives a signal.
The signal may come from the thread because it has detected the event or it may be due to a timeout because I don't want it to wait for ever.
I have made some code but it does not work as expected...
/*
* This code SHOULD start a thread that gets messages from stdin.
* If the message is a "quit", the thread exits. Otherwise it raises
* a signal that should be caught by the main program.
* The main program simply waits for the message unless a timer of
* 5.5 seconds expires before receiving the signal from the thread.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/time.h>
#include <signal.h>
#define BSIZE 100 /* Buffer size */
sigset_t mask;
pthread_t tid;
//struct itimerval timervalue;
int milisec = 5500; /* Timeout: 5,5 seconds */
int end = 0;
char buffer[BSIZE];
//Function prototypes
void init(void);
void * thread_job(void *);
void message_rcvd(void);
void wait_for_message_or_timeout(void);
int start_timer(struct itimerval, int);
int stop_timer(struct itimerval);
void on_signal(int);
// MAIN: Wait for message or timeout
int main(int argc, char ** argv) {
init();
while(!end){
wait_for_message_or_timeout();
if(!end)
printf("Message received [%s]\n", buffer);
}
return 0;
}
// INIT: Initializes the signals that the program will wait for
// and creates a thread that will eventually generate a signal
void init()
{
/* Init the signals I want to wait for with sigwait() */
sigemptyset(&mask);
sigaddset(&mask, SIGUSR1);
sigaddset(&mask, SIGALRM);
sigprocmask(SIG_BLOCK, &mask, NULL);
//signal(SIGUSR1, SIG_IGN);
signal(SIGUSR1, on_signal);
/* Create the thread and put it to work */
pthread_t tid;
pthread_create(&tid, NULL, thread_job, NULL);
}
void on_signal(int signum){
printf("on_signal\n");
}
// THREAD CODE -------------
// THREAD JOB: When the user inputs a message, it passes the message
// to the main thread by invoking message_rcvd()
void * thread_job(){
int end = 0;
while(!end){
printf("Input message:");
if (fgets(buffer, BSIZE, stdin) != NULL)
message_rcvd();
}
}
// MESSAGE RECEIVED: If message is not equal to "quit" raise a signal
void message_rcvd(){
if(strcmp(buffer, "quit") == 0){
exit(0);
}else{
printf("Going to raise SIGUSR1...");
if(raise(SIGUSR1) == 0)
printf("raised!\n");
}
}
// WAIT: Should wait for signal SIGUSR1 for some time
void wait_for_message_or_timeout(){
int sigid;
struct itimerval t;
/* Set a timer to prevent waiting for ever*/
printf("Setting timer...\n");
start_timer(t, milisec);
/* Put the process to wait until signal arrives */
sigwait(&mask, &sigid);
switch(sigid){
case SIGUSR1:
printf("Received SIGUSR1: Message avaible!\n");
break;
case SIGALRM:
printf("Received SIGALRM: Timeout\n");
end = 1;
break;
default:
printf("Unknown signal received\n");
break;
}
printf("Stopping timer...\n");
/* Stop timer */
stop_timer(t);
}
// START TIMER: I don't want the timer to cause the execution
// of a handler function
int start_timer(struct itimerval timervalue, int msec)
//int start_timer(int msec)
{
timervalue.it_interval.tv_sec = msec / 1000;
timervalue.it_interval.tv_usec = (msec % 1000) * 1000;
timervalue.it_value.tv_sec = msec / 1000;
timervalue.it_value.tv_usec = (msec % 1000) * 1000;
if(setitimer(ITIMER_REAL, &timervalue, NULL))
{
printf("\nsetitimer() error\n");
return(-1);
}
return(0);
}
// STOP TIMER:
int stop_timer(struct itimerval timervalue)
//int stop_timer()
{
timervalue.it_interval.tv_sec = 0;
timervalue.it_interval.tv_usec = 0;
timervalue.it_value.tv_sec = 0;
timervalue.it_value.tv_usec = 0;
if(setitimer(ITIMER_REAL, &timervalue, NULL))
{
printf("\nsetitimer() error\n");
return(-1);
}
return(0);
}
Here is a typical execution of this code.
./signaltest
Setting timer...
Input message:hello
Going to raise SIGUSR1...raised!
Input message:friend
Going to raise SIGUSR1...raised!
Input message:Received SIGALRM: Timeout
Stopping timer...
As you can see, the signal SIGUSR1 is being raised and sigwait is being unblocked. However, the code seems does not continue after the signal has been raised. (Note that I don't need a signal handler but I just added for the debugging purposes. I have blocked its execution with sigprocmask)
Why is SIGUSR1 unblocking sigwait but the execution does not continue from there? Is there a way to make it continue after unblocking? This seems to work for SIGALRM but why not for SIGUSR1?
As I said, I have been looking at tons of stackoverflow questions, online howto's, tried with different system calls (e.g., pause, sigsuspend), ... but couldn't find a way to solve this :-(
If you are wondering why I am not doing this code much simpler by not using a thread is because this is not actually the code I am implementing but just a simpler example to make my question more clear. I am actually trying to implement a network protocol API, similar to the sockets API for my own protocol.
Thanks in advance
The SIGUSR1 signal isn't going where you think it is.
In a multithreaded program, the raise function sends a signal to the current thread, which is the thread_job thread in this case. So the main thread never sees the signal.
You need to save off thread ID of the main thread, then use pthread_kill to send a signal to that thread.
Add a new global:
pthread_t main_tid;
Then populate it in your init function before starting the new thread:
void init()
{
main_tid = pthread_self();
...
Then in message_rcvd, use pthread_kill:
if(pthread_kill(main_tid, SIGUSR1) == 0)
printf("raised!\n");
Also, remove the definition of end in thread_job, and remove the definition of tid in init. These definitions mask the global variables of the same name.
Sample output:
Setting timer...
Input message:hello
Going to raise SIGUSR1...raised!
Input message:Received SIGUSR1: Message avaible!
Stopping timer...
Message received [hello
]
Setting timer...
test
Going to raise SIGUSR1...raised!
Input message:Received SIGUSR1: Message avaible!
Stopping timer...
Message received [test
]
Setting timer...
Received SIGALRM: Timeout
Stopping timer...

sigwait() and signal handler

If I setup and signal handler for SIGABRT and meanwhile I have a thread that waits on sigwait() for SIGABRT to come (I have a blocked SIGABRT in other threads by pthread_sigmask).
So which one will be processed first ? Signal handler or sigwait() ?
[I am facing some issues that sigwait() is get blocked for ever. I am debugging it currently]
main()
{
sigset_t signal_set;
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGABRT);
sigprocmask(SIG_BLOCK, &signal_set, NULL);
// Dont deliver SIGABORT while running this thread and it's kids.
pthread_sigmask(SIG_BLOCK, &signal_set, NULL);
pthread_create(&tAbortWaitThread, NULL, WaitForAbortThread, NULL);
..
Create all other threads
...
}
static void* WaitForAbortThread(void* v)
{
sigset_t signal_set;
int stat;
int sig;
sigfillset( &signal_set);
pthread_sigmask( SIG_BLOCK, &signal_set, NULL ); // Dont want any signals
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGABRT); // Add only SIGABRT
// This thread while executing , will handle the SIGABORT signal via signal handler.
pthread_sigmask(SIG_UNBLOCK, &signal_set, NULL);
stat= sigwait( &signal_set, &sig ); // lets wait for signal handled in CatchAbort().
while (stat == -1)
{
stat= sigwait( &signal_set, &sig );
}
TellAllThreadsWeAreGoingDown();
sleep(10);
return null;
}
// Abort signal handler executed via sigaction().
static void CatchAbort(int i, siginfo_t* info, void* v)
{
sleep(20); // Dont return , hold on till the other threads are down.
}
Here at sigwait(), i will come to know that SIGABRT is received. I will tell other threads about it. Then will hold abort signal handler so that process is not terminated.
I wanted to know the interaction of sigwait() and the signal handler.
From sigwait() documentation :
The sigwait() function suspends execution of the calling thread until
one of the signals specified in the signal set becomes pending.
A pending signal means a blocked signal waiting to be delivered to one of the thread/process. Therefore, you need not to unblock the signal like you did with your pthread_sigmask(SIG_UNBLOCK, &signal_set, NULL) call.
This should work :
static void* WaitForAbortThread(void* v){
sigset_t signal_set;
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGABRT);
sigwait( &signal_set, &sig );
TellAllThreadsWeAreGoingDown();
sleep(10);
return null;
}
I got some information from this <link>
It says :
To allow a thread to wait for asynchronously generated signals, the threads library provides the sigwait subroutine. The sigwait subroutine blocks the calling thread until one of the awaited signals is sent to the process or to the thread. There must not be a signal handler installed on the awaited signal using the sigwait subroutine.
I will remove the sigaction() handler and try only sigwait().
From the code snippet you've posted, it seems you got the use of sigwait() wrong. AFAIU, you need WaitForAbortThread like below:
sigemptyset( &signal_set); // change it from sigfillset()
for (;;) {
stat = sigwait(&signal_set, &sig);
if (sig == SIGABRT) {
printf("here's sigbart.. do whatever you want.\n");
pthread_kill(tid, signal); // thread id and signal
}
}
I don't think pthread_sigmask() is really needed. Since you only want to handle SIGABRT, first init signal_set as empty then simply add SIGABRT, then jump into the infinite loop, sigwait will wait for the particular signal that you're looking for, you check the signal if it's SIGABRT, if yes - do whatever you want. NOTE the uses of pthread_kill(), use it to sent any signal to other threads specified via tid and the signal you want to sent, make sure you know the tid of other threads you want to sent signal. Hope this will help!
I know this question is about a year old, but I often use a pattern, which solves exactly this issue using pthreads and signals. It is a little length but takes care of any issues I am aware of.
I recently used in combination with a library wrapped with SWIG and called from within Python. An annoying issue was that my IRQ thread waiting for SIGINT using sigwait never received the SIGINT signal. The same library worked perfectly when called from Matlab, which didn't capture the SIGINT signal.
The solution was to install a signal handler
#define _NTHREADS 8
#include <signal.h>
#include <pthread.h>
#include <unistd.h>
#include <sched.h>
#include <linux/unistd.h>
#include <sys/signal.h>
#include <sys/syscall.h>
#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h> // strerror
#define CallErr(fun, arg) { if ((fun arg)<0) \
FailErr(#fun) }
#define CallErrExit(fun, arg, ret) { if ((fun arg)<0) \
FailErrExit(#fun,ret) }
#define FailErrExit(msg,ret) { \
(void)fprintf(stderr, "FAILED: %s(errno=%d strerror=%s)\n", \
msg, errno, strerror(errno)); \
(void)fflush(stderr); \
return ret; }
#define FailErr(msg) { \
(void)fprintf(stderr, "FAILED: %s(errno=%d strerror=%s)\n", \
msg, errno, strerror(errno)); \
(void)fflush(stderr);}
typedef struct thread_arg {
int cpu_id;
int thread_id;
} thread_arg_t;
static jmp_buf jmp_env;
static struct sigaction act;
static struct sigaction oact;
size_t exitnow = 0;
pthread_mutex_t exit_mutex;
pthread_attr_t attr;
pthread_t pids[_NTHREADS];
pid_t tids[_NTHREADS+1];
static volatile int status[_NTHREADS]; // 0: suspended, 1: interrupted, 2: success
sigset_t mask;
static pid_t gettid( void );
static void *thread_function(void *arg);
static void signalHandler(int);
int main() {
cpu_set_t cpuset;
int nproc;
int i;
thread_arg_t thread_args[_NTHREADS];
int id;
CPU_ZERO( &cpuset );
CallErr(sched_getaffinity,
(gettid(), sizeof( cpu_set_t ), &cpuset));
nproc = CPU_COUNT(&cpuset);
for (i=0 ; i < _NTHREADS ; i++) {
thread_args[i].cpu_id = i % nproc;
thread_args[i].thread_id = i;
status[i] = 0;
}
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
pthread_mutex_init(&exit_mutex, NULL);
// We pray for no locks on buffers and setbuf will work, if not we
// need to use filelock() on on FILE* access, tricky
setbuf(stdout, NULL);
setbuf(stderr, NULL);
act.sa_flags = SA_NOCLDSTOP | SA_NOCLDWAIT;
act.sa_handler = signalHandler;
sigemptyset(&act.sa_mask);
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
if (setjmp(jmp_env)) {
if (gettid()==tids[0]) {
// Main Thread
printf("main thread: waiting for clients to terminate\n");
for (i = 0; i < _NTHREADS; i++) {
CallErr(pthread_join, (pids[i], NULL));
if (status[i] == 1)
printf("thread %d: terminated\n",i+1);
}
// On linux this can be done immediate after creation
CallErr(pthread_attr_destroy, (&attr));
CallErr(pthread_mutex_destroy, (&exit_mutex));
return 0;
}
else {
// Should never happen
printf("worker thread received signal");
}
return -1;
}
// Install handler
CallErr(sigaction, (SIGINT, &act, &oact));
// Block SIGINT
CallErr(pthread_sigmask, (SIG_BLOCK, &mask, NULL));
tids[0] = gettid();
srand ( time(NULL) );
for (i = 0; i < _NTHREADS; i++) {
// Inherits main threads signal handler, they are blocking
CallErr(pthread_create,
(&pids[i], &attr, thread_function,
(void *)&thread_args[i]));
}
if (pthread_sigmask(SIG_UNBLOCK, &mask, NULL)) {
fprintf(stderr, "main thread: can't block SIGINT");
}
printf("Infinite loop started - CTRL-C to exit\n");
for (i = 0; i < _NTHREADS; i++) {
CallErr(pthread_join, (pids[i], NULL));
//printf("%d\n",status[i]);
if (status[i] == 2)
printf("thread %d: finished succesfully\n",i+1);
}
// Clean up and exit
CallErr(pthread_attr_destroy, (&attr));
CallErr(pthread_mutex_destroy, (&exit_mutex));
return 0;
}
static void signalHandler(int sig) {
int i;
pthread_t id;
id = pthread_self();
for (i = 0; i < _NTHREADS; i++)
if (pids[i] == id) {
// Exits if worker thread
printf("Worker thread caught signal");
break;
}
if (sig==2) {
sigaction(SIGINT, &oact, &act);
}
pthread_mutex_lock(&exit_mutex);
if (!exitnow)
exitnow = 1;
pthread_mutex_unlock(&exit_mutex);
longjmp(jmp_env, 1);
}
void *thread_function(void *arg) {
cpu_set_t set;
thread_arg_t* threadarg;
int thread_id;
threadarg = (thread_arg_t*) arg;
thread_id = threadarg->thread_id+1;
tids[thread_id] = gettid();
CPU_ZERO( &set );
CPU_SET( threadarg->cpu_id, &set );
CallErrExit(sched_setaffinity, (gettid(), sizeof(cpu_set_t), &set ),
NULL);
int k = 8;
// While loop waiting for exit condition
while (k>0) {
sleep(rand() % 3);
pthread_mutex_lock(&exit_mutex);
if (exitnow) {
status[threadarg->thread_id] = 1;
pthread_mutex_unlock(&exit_mutex);
pthread_exit(NULL);
}
pthread_mutex_unlock(&exit_mutex);
k--;
}
status[threadarg->thread_id] = 2;
pthread_exit(NULL);
}
static pid_t gettid( void ) {
pid_t pid;
CallErr(pid = syscall, (__NR_gettid));
return pid;
}
I run serveral tests and the conbinations and results are:
For all test cases, I register a signal handler by calling sigaction in the main thread.
main thread block target signal, thread A unblock target signal by calling pthread_sigmask, thread A sleep, send target signal.
result: signal handler is executed in thread A.
main thread block target signal, thread A unblock target signal by calling pthread_sigmask, thread A calls sigwait, send target signal.
result: sigwait is executed.
main thread does not block target signal, thread A does not block target signal, thread A calls sigwait, send target signal.
result: main thread is chosen and the registered signal handler is executed in the main thread.
As you can see, conbination 1 and 2 are easy to understand and conclude.
It is:
If a signal is blocked by a thread, then the process-wide signal handler registered by sigaction just can't catch or even know it.
If a signal is not blocked, and it's sent before calling sigwait, the process-wide signal handler wins. And that's why APUE the books require us to block the target signal before calling sigwait. Here I use sleep in thread A to simulate a long "window time".
If a signal is not blocked, and it's sent when sigwait has already been waiting, sigwait wins.
But you should notice that for test case 1 and 2, main thread is designed to block the target signal.
At last for test case 3, when main thread is not blocked the target signal, and sigwait in thread A is also waiting, the signal handler is executed in the main thread.
I believe the behaviour of test case 3 is what APUE talks about:
From APUE ยง12.8:
If a signal is being caught (the process has established a signal
handler by using sigaction, for example) and a thread is waiting for
the same signal in a call to sigwait, it is left up to the
implementation to decide which way to deliver the signal. The
implementation could either allow sigwait to return or invoke the
signal handler, but not both.
Above all, if you want to accomplish one thread <-> one signal model, you should:
block all signals in the main thread with pthread_sigmask (subsequent thread created in main thread inheris the signal mask)
create threads and call sigwait(target_signal) with target signal.
test code
#define _POSIX_C_SOURCE 200809L
#include <signal.h>
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
FILE* file;
void* threadA(void* argv){
fprintf(file, "%ld\n", pthread_self());
sigset_t m;
sigemptyset(&m);
sigaddset(&m, SIGUSR1);
int signo;
int err;
// sigset_t q;
// sigemptyset(&q);
// pthread_sigmask(SIG_SETMASK, &q, NULL);
// sleep(50);
fprintf(file, "1\n");
err = sigwait(&m, &signo);
if (err != 0){
fprintf(file, "sigwait error\n");
exit(1);
}
switch (signo)
{
case SIGUSR1:
fprintf(file, "SIGUSR1 received\n");
break;
default:
fprintf(file, "?\n");
break;
}
fprintf(file, "2\n");
}
void hello(int signo){
fprintf(file, "%ld\n", pthread_self());
fprintf(file, "hello\n");
}
int main(){
file = fopen("daemon", "wb");
setbuf(file, NULL);
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_handler = hello;
sigaction(SIGUSR1, &sa, NULL);
sigset_t n;
sigemptyset(&n);
sigaddset(&n, SIGUSR1);
// pthread_sigmask(SIG_BLOCK, &n, NULL);
pthread_t pid;
int err;
err = pthread_create(&pid, NULL, threadA, NULL);
if(err != 0){
fprintf(file, "create thread error\n");
exit(1);
}
pause();
fprintf(file, "after pause\n");
fclose(file);
return 0;
}
run with ./a.out & (run in the background), and use kill -SIGUSR1 pid to test. Do not use raise. raise, sleep, pause are thread-wide.

recv() is not interrupted by a signal in multithreaded environment

I have a thread that sits in a blocking recv() loop and I want to terminate (assume this can't be changed to select() or any other asynchronous approach).
I also have a signal handler that catches SIGINT and theoretically it should make recv() return with error and errno set to EINTR.
But it doesn't, which I assume has something to do with the fact that the application is multi-threaded. There is also another thread, which is meanwhile waiting on a pthread_join() call.
What's happening here?
EDIT:
OK, now I explicitly deliver the signal to all blocking recv() threads via pthread_kill() from the main thread (which results in the same global SIGINT signal handler installed, though multiple invocations are benign). But recv() call is still not unblocked.
EDIT:
I've written a code sample that reproduces the problem.
Main thread connects a socket to a misbehaving remote host that won't let the connection go.
All signals blocked.
Read thread thread is started.
Main unblocks and installs handler for SIGINT.
Read thread unblocks and installs handler for SIGUSR1.
Main thread's signal handler sends a SIGUSR1 to the read thread.
Interestingly, if I replace recv() with sleep() it is interrupted just fine.
PS
Alternatively you can just open a UDP socket instead of using a server.
client
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <errno.h>
static void
err(const char *msg)
{
perror(msg);
abort();
}
static void
blockall()
{
sigset_t ss;
sigfillset(&ss);
if (pthread_sigmask(SIG_BLOCK, &ss, NULL))
err("pthread_sigmask");
}
static void
unblock(int signum)
{
sigset_t ss;
sigemptyset(&ss);
sigaddset(&ss, signum);
if (pthread_sigmask(SIG_UNBLOCK, &ss, NULL))
err("pthread_sigmask");
}
void
sigusr1(int signum)
{
(void)signum;
printf("%lu: SIGUSR1\n", pthread_self());
}
void*
read_thread(void *arg)
{
int sock, r;
char buf[100];
unblock(SIGUSR1);
signal(SIGUSR1, &sigusr1);
sock = *(int*)arg;
printf("Thread (self=%lu, sock=%d)\n", pthread_self(), sock);
r = 1;
while (r > 0)
{
r = recv(sock, buf, sizeof buf, 0);
printf("recv=%d\n", r);
}
if (r < 0)
perror("recv");
return NULL;
}
int sock;
pthread_t t;
void
sigint(int signum)
{
int r;
(void)signum;
printf("%lu: SIGINT\n", pthread_self());
printf("Killing %lu\n", t);
r = pthread_kill(t, SIGUSR1);
if (r)
{
printf("%s\n", strerror(r));
abort();
}
}
int
main()
{
pthread_attr_t attr;
struct sockaddr_in addr;
printf("main thread: %lu\n", pthread_self());
memset(&addr, 0, sizeof addr);
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (socket < 0)
err("socket");
addr.sin_family = AF_INET;
addr.sin_port = htons(8888);
if (inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr) <= 0)
err("inet_pton");
if (connect(sock, (struct sockaddr *)&addr, sizeof addr))
err("connect");
blockall();
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
if (pthread_create(&t, &attr, &read_thread, &sock))
err("pthread_create");
pthread_attr_destroy(&attr);
unblock(SIGINT);
signal(SIGINT, &sigint);
if (sleep(1000))
perror("sleep");
if (pthread_join(t, NULL))
err("pthread_join");
if (close(sock))
err("close");
return 0;
}
server
import socket
import time
s = socket.socket(socket.AF_INET)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('127.0.0.1',8888))
s.listen(1)
c = []
while True:
(conn, addr) = s.accept()
c.append(conn)
Normally signals do not interrupt system calls with EINTR. Historically there were two possible signal delivery behaviors: the BSD behavior (syscalls are automatically restarted when interrupted by a signal) and the Unix System V behavior (syscalls return -1 with errno set to EINTR when interrupted by a signal). Linux (the kernel) adopted the latter, but the GNU C library developers (correctly) deemed the BSD behavior to be much more sane, and so on modern Linux systems, calling signal (which is a library function) results in the BSD behavior.
POSIX allows either behavior, so it's advisable to always use sigaction where you can choose to set the SA_RESTART flag or omit it depending on the behavior you want. See the documentation for sigaction here:
http://www.opengroup.org/onlinepubs/9699919799/functions/sigaction.html
In a multi-threaded application, normal signals can be delivered to any thread arbitrarily. Use pthread_kill to send the signal to the specific thread of interest.
Does signal handler invoked in same thread which waits in recv()?
You may need to explicitly mask SIGINT in all other threads via pthread_sigmask()
As alluded to in the post by <R..>, it is indeed possible to change the signal activities.
I often create my own "signal" function that makes use of sigaction. Here's what I use
typedef void Sigfunc(int);
static Sigfunc*
_signal(int signum, Sigfunc* func)
{
struct sigaction act, oact;
act.sa_handler = func;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
if (signum != SIGALRM)
act.sa_flags |= SA_NODEFER; //SA_RESTART;
if (sigaction(signum, &act, &oact) < 0)
return (SIG_ERR);
return oact.sa_handler;
}
The attribute in question above is the 'or'ing of the sa_flags field. This is from the man page for 'sigaction': SA_RESTART provides the BSD-like behavior of allowing system calls to be restartable across signals. SA_NODEFER means allow the signal to be received from within its own signal handler.
When the signal calls are replaced with "_signal", the thread is interrupted. The output prints out "interrupted system call" and recv returned a -1 when SIGUSR1 was sent. The program stopped altogether with the same output when SIGINT was sent, but the abort was called at the end.
I did not write the server portion of the code, I just changed the socket type to "DGRAM, UDP" to allow the client to start.
You can set a timeout on Linux recv: Linux: is there a read or recv from socket with timeout?
When you get a signal, call done on the class doing the receive.
void* signalThread( void* ptr )
{
CapturePkts* cap=(CapturePkts*)ptr;
sigset_t sigSet=cap->getSigSet();
int sig=-1;
sigwait(&sigSet,&sig); //signalThread: signal capture thread enabled;
cout << "signal=" << sig << " caught,ending process" << endl;
cap->setDone();
return 0;
}
class CapturePkts
{
CapturePkts() : _done(false) {}
sigset_t getSigSet() { return _sigSet; }
void setDone() {_done=true;}
bool receive( uint8_t *buffer, int32_t bufSz, int32_t &nbytes)
{
bool ret=true;
while( ! _done ) {
nbytes = ::recv( _sockid, buffer, bufSz, 0 );
if(nbytes < 1 ) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
nbytes=0; //wait for next read event
else
ret=false;
}
return ret;
}
private:
sigset_t _sigSet;
bool _done;
};

Resources