I have a library that accesses a hardware resource (SPI) via a 3rd party library. My library, and in turn the SPI resource, is accessed by multiple processes so I need to lock the resource with semaphores, the lock functions are below:
static int spi_lock(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_REALTIME, &ts) == -1)
{
syslog(LOG_ERR,"failed to read clock: %s\n", SPISEM, strerror(errno));
return 3;
}
ts.tv_sec += 5;
if (sem_timedwait(bcoms->spisem, &ts) == -1)
{
syslog(LOG_ERR,"timed out trying to acquire %s: %s\n", SPISEM, strerror(errno));
return 1;
}
return 0;
}
static int spi_unlock(void)
{
int ret = 1;
if (sem_post(bcoms->spisem))
{
syslog(LOG_ERR,"failed to release %s: %s\n", SPISEM, strerror(errno));
goto done;
}
ret = 0;
done:
return ret;
}
Now my problem is the library is used in a daemon and that daemon is stopped via a kill signal. Sometimes I get the kill signal while I am holding the semaphore lock and hence the servers cannot be restarted successfully because the lock is perpetually taken. To fix this I am trying to block the signals as shown below (I am waiting for hardware to test this on atm):
static int spi_lock(void)
{
sigset_t nset;
struct timespec ts;
sigfillset(&nset);
sigprocmask(SIG_BLOCK, &nset, NULL);
if (clock_gettime(CLOCK_REALTIME, &ts) == -1)
{
syslog(LOG_ERR,"failed to read clock: %s\n", SPISEM, strerror(errno));
return 3;
}
ts.tv_sec += 5; // 5 seconds to acquire the semaphore is HEAPS, so we better bloody get it !!!
if (sem_timedwait(bcoms->spisem, &ts) == -1)
{
syslog(LOG_ERR,"timed out trying to acquire %s: %s\n", SPISEM, strerror(errno));
return 1;
}
return 0;
}
static int spi_unlock(void)
{
sigset_t nset;
int ret = 1;
if (sem_post(bcoms->spisem))
{
syslog(LOG_ERR,"failed to release %s: %s\n", SPISEM, strerror(errno));
goto done;
}
sigfillset(&nset);
sigprocmask(SIG_UNBLOCK, &nset, NULL);
ret = 0;
done:
return ret;
}
But having read the man pages for sigprocmask() it says in a multi-threaded system to use pthread_sigmask(), and one of the servers I want to protect is will be multi threaded. What I don't understand is if I use pthread_sigmask() in the library, and the main parent thread spawns a SPI read thread that uses those locking functions in my library, the read thread will be protected, but can't the main thread still receive the kill signal and take down the daemon while I am holding the mutex with the signals disabled on the read thread getting me no where? If so is there a better solution to this locking problem?
Thanks.
Indeed you've analyzed the problem correctly - masking signals does not protect you. Masking signals is not the right tool to prevent process termination with shared data (like files or shared semaphores) in an inconsistent state.
What you probably should be doing, if you want to exit gracefully on certain signals, is having the program install signal handlers to catch the termination request and feed it into your normal program logic. There are several approaches you can use:
Send the termination request over a pipe to yourself. This works well if your program is structured around a poll loop that can wait for input on a pipe.
Use sem_post, the one async-signal-safe synchronization function, to report the signal to the rest of the program.
Start a dedicated signal-handling thread from the main thread then block all signals in the main thread (and, by inheritance, all other new threads). This thread can just do for(;;) pause(); and since pause is async-signal-safe, you can call any functions you want from the signal handlers -- including the pthread sync functions needed for synchronizing with other threads.
Note that this approach will still not be "perfect" since you can never catch or block SIGKILL. If a user decides to kill your process with SIGKILL (kill -9) then the semaphore can be left in a bad state and there's nothing you can do.
I don't think your approach will work. You can not block SIGKILL or SIGSTOP. Unless you are saying that the daemon is getting a different signal (like SIGHUP). But even then I think it's bad practice to block all signals from a library call. That can result in adverse effects on the calling application. For example, the application may be relying on particular signals and missing any such signals could cause it to function incorrectly.
As it turns out there probably isn't an easy way to solve your problem using semaphores. So an alternative approach is to use something like "flock" instead. That solves your problem because it is based on open file descriptors. If a process dies holding an flock the associated file descriptor will be automatically closed and hence will free the flock.
Related
I'm writing a multithreaded server program in C that works with AF_UNIX sockets.
The basic structure of the server is:
Main thread initialize data structures and spears a pool of "worker" threads.
Worker threads start waiting for new requests on an empty thread-safe queue
Main thread listen on various sockets (new connection and already connected clients) with a select() call.
select() reveals possible read on connection socket: main thread calls accept() and puts the returned file descriptor in the fd_set (read set).
select() reveal possible read on already connected sockets: main thread removes the ready file descriptors from the fd_set (read set) and puts them in the thread-safe queue.
Worker thread extracts a file descriptor from the queue and starts to communicate with the linked client for serve the request. At the end of the service worker thread puts socket file descriptor back to the fd_set (i worte a function to make this operation thread-safe) and it returns waiting again on the queue for a new request.
This routine is repeated in a infinite cycle until a SIGINT is raised.
Another function has to be performed on SIGUSR1 without exiting from the cycle.
My doubt is about this because if I raise a SIGINT my program exit with EINTR = Interrupted system call.
I know about the pselect() call and the "self pipe" trick but i can't figure out how to make the things work in a multithreaded situation.
I'm looking for a (POSIX compatible) signal management that that prevent the EINTR error while main thread is waiting on pselect().
I post some pieces of code for clarification:
Here i set up signal handlers (ignore errorConsolePrint function)
if(signal(SIGINT, &on_SIGINT) == SIG_ERR)
{
errorConsolePrint("File: %s; Line: %d; ", "Setting SIGINT handler", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
if(signal(SIGTERM, &on_SIGINT) == SIG_ERR)
{
errorConsolePrint("File: %s; Line: %d; ", "Setting SIGINT handler", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
if(signal(SIGUSR1, &on_SIGUSR1) == SIG_ERR)
{
errorConsolePrint("File: %s; Line: %d; ", "Setting to SIGUSR1 handler", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
if(signal(SIGPIPE, SIG_IGN) == SIG_ERR)
{
errorConsolePrint("File: %s; Line: %d; ", "Setting to ignore SIGPIPE", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
Here i set up signal mask for pselect
sigemptyset(&mask);
sigemptyset(&saveMask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGUSR1);
sigaddset(&mask, SIGPIPE);
Here i call pselect
test = saveSet(masterSet, &backUpSet, &saveMaxFd);
CHECK_MINUS1(test, "Server: creating master set's backup ");
int test = pselect(saveMaxFd+1, &backUpSet, NULL, NULL, &waiting, &mask);
if(test == -1 && errno != EINTR)
{
...error handling...
continue;
}
Hope in some help!
Thank you all in advance.
What you should probably do is dedicate a thread to signal handling. Here's a sketch:
In main, before spawning any threads, block all signals (using pthread_sigmask) except for SIGILL, SIGABRT, SIGFPE, SIGSEGV, and SIGBUS.
Then, spawn your signal handler thread. This thread loops calling sigwaitinfo for the signals you care about. It takes whatever action is appropriate for each; this could include sending a message to the main thread to trigger a clean shutdown (SIGINT), queuing the "another function" to be processed in the worker pool (SIGUSR1), etc. You do not install handlers for these signals.
Then you spawn your thread pool, which doesn't have to care about signals at all.
I would suggest the following strategy:
During initialization, set up your signal handlers, as you do.
During initialization, block all (blockable) signals. See for example Is it possible to ignore all signals?.
Use pselect in your main thread to unblock threads for the duration of the call, again as you do.
This has the advantage that all of your system calls, including all those in all your worker threads, will never return EINTR, except for the single pselect in the main thread. See for example the answers to Am I over-engineering per-thread signal blocking? and pselect does not return on signal when called from a separate thread but works fine in single thread program.
This strategy would also work with select: just unblock the signals in your main thread immediately before calling select, and re-block them afterwards. You only really need pselect to prevent hanging if your select timeout is long or infinite, and if your file descriptors are mostly inactive. (I've never used pselect myself, having worked mostly with older Unix's which did not have it.)
I am presuming that your signal handlers as suitable: for example, they just atomically set a global variable.
BTW, in your sample code, do you need sigaddset(&mask, SIGPIPE), as SIGPIPE is already ignored?
Ok, finally I got a solution.
The heart of my problem was about the multithreading nature of my server.
After long search I found out that in the case we have signals raised from other process (in an asyncronous way), it doens't matter which thread capture signal because the behaviour remains the same: The signal is catched and the previously registered handler is executed.
Maybe this could be obvious for others but this was driving me crazy because I did not know how to interpret errors that came out during execution.
After that i found another problem that I solved, is about the obsolete signal() call.
During execution, the first time i rise SIGUSR1, the program catch and manage it as expected but the second time it exit with User defined signal 1.
I figured out that signal() call set "one time" handler for a specific signal, after the first time that the signal is handled the behaviour for that signal return the default one.
So here's what I did:
Here the signal handlers:
N.B.: I reset handler for SIGUSR1 inside the handler itself
static void on_SIGINT(int signum)
{
if(signum == SIGINT || signum == SIGTERM)
serverStop = TRUE;
}
static void on_SIGUSR1(int signum)
{
if(signum == SIGUSR1)
pendingSIGUSR1 = TRUE;
if(signal(SIGUSR1, &on_SIGUSR1) == SIG_ERR)
exit(EXIT_FAILURE);
}
Here I set handlers during server's initialization:
if(signal(SIGINT, &on_SIGINT) == SIG_ERR)
exit(EXIT_FAILURE);
if(signal(SIGTERM, &on_SIGINT) == SIG_ERR)
exit(EXIT_FAILURE);
if(signal(SIGUSR1, &on_SIGUSR1) == SIG_ERR)
exit(EXIT_FAILURE);
if(signal(SIGPIPE, SIG_IGN) == SIG_ERR)
exit(EXIT_FAILURE);
And here the server's listening cycle:
while(!serverStop)
{
if (pendingSIGUSR1)
{
... things i have to do on SIGUSR1...
pendingSIGUSR1 = FALSE;
}
test = saveSet(masterSet, &backUpSet, &saveMaxFd);
CHECK_MINUS1(test, "Server: creating master set's backup ");
int test = select(saveMaxFd+1, &backUpSet, NULL, NULL, &waiting);
if((test == -1 && errno == EINTR) || test == 0)
continue;
if (test == -1 && errno != EINTR)
{
perror("Server: Monitoring sockets: ");
exit(EXIT_FAILURE);
}
for(int sock=3; sock <= saveMaxFd; sock++)
{
if (FD_ISSET(sock, &backUpSet))
{
if(sock == ConnectionSocket)
{
ClientSocket = accept(ConnectionSocket, NULL, 0);
CHECK_MINUS1(ClientSocket, "Server: Accepting connection");
test = INset(masterSet, ClientSocket);
CHECK_MINUS1(test, "Server: Inserting new connection in master set: ");
}
else
{
test = OUTset(masterSet, sock);
CHECK_MINUS1(test, "Server: Removing file descriptor from select ");
test = insertRequest(chain, sock);
CHECK_MINUS1(test, "Server: Inserting request in chain");
}
}
}
}
Read first signal(7) and signal-safety(7); you might want to use the Linux specific signalfd(2) since it fits nicely (for SIGTERM & SIGQUIT and SIGINT) into event loops around poll(2) or the old select(2) (or the newer pselect or ppoll)
See also this answer (and the pipe(7) to self trick mentioned there, which is POSIX-compatible) to a very similar question.
Also, signal(2) documents:
The effects of signal() in a multithreaded process are unspecified.
so you really should use sigaction(2) (which is POSIX).
I'm writing a code in which I have two threads running in parallel.
1st is the main thread which started the 2nd thread.
2nd thread is just a simple thread executing empty while loop.
Now I want to pause / suspend the execution of 2nd thread by 1st thread who created it.
And after some time I want to resume the execution of 2nd thread (by issuing some command or function) from where it was paused / suspended.
This question is not about how to use mutexes, but how to suspend a thread.
In Unix specification there is a thread function called pthread_suspend, and another called pthread_resume_np, but for some reason the people who make Linux, FreeBSD, NetBSD and so on have not implemented these functions.
So to understand it, the functions simply are not there. There are workarounds but unfortunately it is just not the same as calling SuspendThread on windows. You have to do all kinds of non-portable stuff to make a thread stop and start using signals.
Stopping and resuming threads is vital for debuggers and garbage collectors. For example, I have seen a version of Wine which is not able to properly implement the "SuspendThread" function. Thus any windows program using it will not work properly.
I thought that it was possible to do it properly using signals based on the fact that JVM uses this technique of signals for the Garbage collector, but I have also just seen some articles online where people are noticing deadlocks and so on with the JVM, sometimes unreproducable.
So to come around to answer the question, you cannot properly suspend and resume threads with Unix unless you have a nice Unix that implements pthread_suspend_np. Otherwise you are stuck with signals.
The big problem with Signals is when you have about five different libraries all linked in to the same program and all trying to use the same signals at the same time. For this reason I believe that you cannot actually use something like ValGrind and for example, the Boehm GC in one program. At least without major coding at the very lowest levels of userspace.
Another answer to this question could be. Do what Linuz Torvalds does to NVidia, flip the finger at him and get him to implement the two most critical parts missing from Linux. First, pthread_suspend, and second, a dirty bit on memory pages so that proper garbage collectors can be implemented. Start a large petition online and keep flipping that finger. Maybe by the time Windows 20 comes out, they will realise that Suspending and resuming threads, and having dirty bits is actually one of the fundamental reasons Windows and Mac are better than Linux, or any Unix that does not implement pthread_suspend and also a dirty bit on virtual pages, like VirtualAlloc does in Windows.
I do not live in hope. Actually for me I spent a number of years planning my future around building stuff for Linux but have abandoned hope as a reliable thing all seems to hinge on the availability of a dirty bit for virtual memory, and for suspending threads cleanly.
As far as I know you can't really just pause some other thread using pthreads. You have to have something in your 2nd thread that checks for times it should be paused using something like a condition variable. This is the standard way to do this sort of thing.
I tried suspending and resuming thread using signals, here is my solution. Please compile and link with -pthread.
Signal SIGUSR1 suspends the thread by calling pause() and SIGUSR2 resumes the thread.
From the man page of pause:
pause() causes the calling process (or thread) to sleep until a
signal is delivered that either terminates the process or causes the
invocation of a
signal-catching function.
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
// Since I have only 2 threads so using two variables,
// array of bools will be more useful for `n` number of threads.
static int is_th1_ready = 0;
static int is_th2_ready = 0;
static void cb_sig(int signal)
{
switch(signal) {
case SIGUSR1:
pause();
break;
case SIGUSR2:
break;
}
}
static void *thread_job(void *t_id)
{
int i = 0;
struct sigaction act;
pthread_detach(pthread_self());
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = cb_sig;
if (sigaction(SIGUSR1, &act, NULL) == -1)
printf("unable to handle siguser1\n");
if (sigaction(SIGUSR2, &act, NULL) == -1)
printf("unable to handle siguser2\n");
if (t_id == (void *)1)
is_th1_ready = 1;
if (t_id == (void *)2)
is_th2_ready = 1;
while (1) {
printf("thread id: %p, counter: %d\n", t_id, i++);
sleep(1);
}
return NULL;
}
int main()
{
int terminate = 0;
int user_input;
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread_job, (void *)1);
// Spawned thread2 just to make sure it isn't suspended/paused
// when thread1 received SIGUSR1/SIGUSR2 signal
pthread_create(&thread2, NULL, thread_job, (void *)2);
while (!is_th1_ready && !is_th2_ready);
while (!terminate) {
// to test, I am sensing signals depending on input from STDIN
printf("0: pause thread1, 1: resume thread1, -1: exit\n");
scanf("%d", &user_input);
switch(user_input) {
case -1:
printf("terminating\n");
terminate = 1;
break;
case 0:
printf("raising SIGUSR1 to thread1\n");
pthread_kill(thread1, SIGUSR1);
break;
case 1:
printf("raising SIGUSR2 to thread1\n");
pthread_kill(thread1, SIGUSR2);
break;
}
}
pthread_kill(thread1, SIGKILL);
pthread_kill(thread2, SIGKILL);
return 0;
}
There is no pthread_suspend(), pthread_resume() kind of APIs in POSIX.
Mostly condition variables can be used to control the execution of other threads.
The condition variable mechanism allows threads to suspend execution
and relinquish the processor until some condition is true. A condition
variable must always be associated with a mutex to avoid a race
condition created by one thread preparing to wait and another thread
which may signal the condition before the first thread actually waits
on it resulting in a deadlock.
For more info
Pthreads
Linux Tutorial Posix Threads
If you can use processes instead, you can send job control signals (SIGSTOP / SIGCONT) to the second process. If you still want to share the memory between those processes, you can use SysV shared memory (shmop, shmget, shmctl...).
Even though I haven't tried it myself, it might be possible to use the lower-level clone() syscall to spawn threads that don't share signals. With that, you might be able to send SIGSTOP and SIGCONT to the other thread.
For implementing the pause on a thread, you need to make it wait for some event to happen. Waiting on a spin-lock mutex is CPU cycle wasting. IMHO, this method should not be followed as the CPU cycles could have been used up by other processes/threads.
Wait on a non-blocking descriptor (pipe, socket or some other). Example code for using pipes for inter-thread communication can be seen here
Above solution is useful, if your second thread has more information from multiple sources than just the pause and resume signals. A top-level select/poll/epoll can be used on non-blocking descriptors. You can specify the wait time for select/poll/epoll system calls, and only that much micro-seconds worth of CPU cycles will be wasted.
I mention this solution with forward-thinking that your second thread will have more things or events to handle than just getting paused and resumed. Sorry if it is more detailed than what you asked.
Another simpler approach can be to have a shared boolean variable between these threads.
Main thread is the writer of the variable, 0 - signifies stop. 1 - signifies resume
Second thread only reads the value of the variable. To implement '0' state, use usleep for sime micro-seconds then again check the value. Assuming, few micro-seconds delay is acceptable in your design.
To implement '1' - check the value of the variable after doing certain number of operations.
Otherwise, you can also implement a signal for moving from '1' to '0' state.
You can use mutex to do that, pseudo code would be:
While (true) {
/* pause resume */
lock(my_lock); /* if this is locked by thread1, thread2 will wait until thread1 */
/* unlocks it */
unlock(my_lock); /* unlock so that next iteration thread2 could lock */
/* do actual work here */
}
You can suspend a thread simply by signal
pthread_mutex_t mutex;
static void thread_control_handler(int n, siginfo_t* siginfo, void* sigcontext) {
// wait time out
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
}
// suspend a thread for some time
void thread_suspend(int tid, int time) {
struct sigaction act;
struct sigaction oact;
memset(&act, 0, sizeof(act));
act.sa_sigaction = thread_control_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
sigemptyset(&act.sa_mask);
pthread_mutex_init(&mutex, 0);
if (!sigaction(SIGURG, &act, &oact)) {
pthread_mutex_lock(&mutex);
kill(tid, SIGURG);
sleep(time);
pthread_mutex_unlock(&mutex);
}
}
Not sure if you will like my answer or not. But you can achieve it this way.
If it is a separate process instead of a thread, I have a solution (This might even work for thread, maybe someone can share your thoughts) using signals.
There is no system currently in place to pause or resume the execution of the processes. But surely you can build one.
Steps I would do if I want it in my project:
Register a signal handler for the second process.
Inside the signal handler, wait for a semaphore.
Whenever you want to pause the other process, just send in a signal
that you registered the other process with. The program will go into
sleep state.
When you want to resume the process, you can send a different signal
again. Inside that signal handler, you will check if the semaphore is
locked or not. If it is locked, you will release the semaphore. So
the process 2 will continue its execution.
If you can implement this, please do share your feedack, if it worked for you or not. Thanks.
I'm having a problem in the combined use of execl() and pthread.
My idea is quite simple: write a daemon that in certain situation starts an external process (a separate executable with respect to the daemon itself) and wait for the return value of that process. Moreover I want to have the possibility to start multiple instances of the same process at the same time.
The part of my code to handle multiple threads:
...
for (c_thread=0,i=0;i<N;i++)
{
/* Start actions before start threads */
for (j=c_thread;j<c_thread+config.max_threads;j++)
Before_Process(act[act_index[j]].measID);
/* Now create threads */
for (c=0,j=c_thread;j<c_thread+config.max_threads;j++)
{
Print_Log(LOG_DEBUG,"Create tread n. %d, measurementID=%s",c,act[act_index[j]].measID);
if ((ret=pthread_create(&pth[c],NULL,Start_Process_Thread,(void *) &act[act_index[j]].measID)))
{
Print_Log(LOG_ERR,"Error in creating thread (errorcode: %d)",ret);
exit(EXIT_FAILURE);
}
c++;
}
/* Joint threads */
for (j=0;j<config.max_threads;j++)
{
if ((ret=pthread_join(pth[j], (void**) &r_value[j])))
{
Print_Log(LOG_ERR,"Error in joint thread (errorcode: %d)",ret);
exit(EXIT_FAILURE);
}
}
/* Perform actions after the thread */
for (j=0;j<config.max_threads;j++)
{
status=*(int*) r_value[j];
Print_Log(LOG_DEBUG,"Joint tread n. %d. Return value=%d",j,status);
After_Process(act[act_index[c_thread+j]].measID,status);
}
c_thread += config.max_threads;
}
...
And the function Start_Process_Thread:
void *Start_Process_Thread(void *arg)
{
int *ret;
char *measID;
measID=(char*)arg;
if (!(ret=malloc(sizeof(int))))
{
Print_Log(LOG_ERR, "allocation memory failed, code=%d (%s)",
errno, strerror(errno) );
exit(EXIT_FAILURE);
}
*ret=Start_Process(measID);
pthread_exit(ret);
}
int Start_Process(char *measID)
{
...
pipe(pfd);
pid=fork();
if (!pid)
{
signal(SIGALRM,Timeout);
alarm(config.timeout_process);
flag=0;
/*
Start the Process.
*/
ret=execl(config.pre_processor,buff_list[TokCount-1],config.db_name,measID,(char *) 0);
if (ret==-1)
{
alarm(0);
flag=1;
Print_Log(LOG_ERR,"Cannot run script %s, code=%d (%s)",config.process, errno, strerror(errno));
}
alarm(0);
close(1);
close(pfd[0]);
dup2(pfd[1],1);
write(1,&flag,sizeof(int));
}
else
{
wait(&status);
close(pfd[1]);
read(pfd[0],&flag,sizeof(int));
close(pfd[0]);
if (!flag)
{
if (WIFEXITED(status))
{
if (!(return_value=WEXITSTATUS(status)))
{
/*
Process gives no errors.
*/
Print_Log(LOG_INFO, "Processing of measurementID=%s ended succesfully!",measID);
}
else
{
/*
Process gives errors.
*/
Print_Log(LOG_WARNING,"Processor failed for measurementID=%s, code=%d",measID, return_value);
}
}
else
{
/*
Timeout for Process
*/
Print_Log( LOG_WARNING,"Timeout occurred in processing measurementID=%s",measID);
return_value=255;
}
}
}
}
The above code works fine from technical point of view but I have a problem somewhere in handling the return values of the different instances of the called external process. In particular it happens that the return value associated to a certain instance is attributed to a different one randomly.
For example suppose 4 different instances of the external process are called with the arguments meas1, meas2, meas3 and meas4 respectively and suppose that meas1, meas2 and meas3 are successfully processed and that for meas4 the process fails. In situation like that my code mix up the return vales giving success for meas1, meas3, and meas4 and failure for meas2 or success for meas1, meas2, meas4 and failure for meas3.
Any idea on why this can happens?
Any help is really welcome.
Thank you in advance for your attention.
When any thread in a process executes wait(), it gets the information about any of the process's dead children — not necessarily about the last child started by the thread that is doing the waiting.
You are going to need to think about:
Capturing the PID of the process that died (it is returned by wait(), but you ignore that).
Having a single thread designated as the 'disposer of corpses' (a thread that does nothing but wait() and record and report on deaths in the family of child processes).
A data structure that allows the threads that start processes to record that they are interested in the status of the child when it dies. Presumably, the child should wait on a suitable condition once a child starts so that it is not consuming CPU time doing nothing useful.
The 'disposer of corpses' thread handles notifications of the appropriate other thread whenever it collects a corpse.
Worry about timeouts on the processes, and killing children who run wild for too long.
It's a morbid business at times...
After a fork call, i have one father that must send sigusr1 or sigusr2 (based on the value of the 'cod' variable) to his child. The child have to install the proper handlers before receiving sigusr1 or sigusr2. For doing so, i pause the father waiting for the child to signal him telling that he's done with the handler installation. The father is signaled by sigusr1 and the handler for this signal is installed before the fork call. However, it seems the father can't return from pause making me think that he actually never call the sigusr1 handler.
[...]
typedef enum{FALSE, TRUE} boolean;
boolean sigusr1setted = FALSE;
boolean sigusr2setted = FALSE;
void
sigusr1_handler0(int signo){
return;
}
void
sigusr1_handler(int signo){
sigusr1setted = TRUE;
}
void
sigusr2_handler(int signo){
sigusr2setted = TRUE;
}
int main(int argc, char *argv[]){
[...]
if(signal(SIGUSR1, sigusr1_handler0) == SIG_ERR){
perror("signal 0 error");
exit(EXIT_FAILURE);
}
pid = fork();
if (pid == 0){
if(signal(SIGUSR1, sigusr1_handler) == SIG_ERR){
perror("signal 1 error");
exit(EXIT_FAILURE);
}
if(signal(SIGUSR2, sigusr2_handler) == SIG_ERR){
perror("signal 2 error");
exit(EXIT_FAILURE);
}
kill(SIGUSR1, getppid()); // wake up parent by signaling him with sigusr1
// Wait for the parent to send the signals...
pause();
if(sigusr1setted){
if(execl("Prog1", "Prog1", (char*)0) < 0){
perror("exec P1 error");
exit(EXIT_FAILURE);
}
}
if(sigusr2setted){
if(execl("Prog2", "Prog2", (char*)0) < 0){
perror("exec P2 error");
exit(EXIT_FAILURE);
}
}
// Should'nt reach this point : something went wrong...
exit(EXIT_FAILURE);
}else if (pid > 0){
// The father must wake only after the child has done with the handlers installation
pause();
// Never reaches this point ...
if (cod == 1)
kill(SIGUSR1, pid);
else
kill(SIGUSR2, pid);
// Wait for the child to complete..
if(wait(NULL) == -1){
perror("wait 2 error");
exit(EXIT_FAILURE);
}
[...]
}else{
perror("fork 2 error");
exit(EXIT_FAILURE);
}
[...]
exit(EXIT_SUCCESS);
}
Assembling a plausible answer from the comments - so this is Community Wiki from the outset. (If Oli provides an answer, up-vote that instead of this!)
Oli Charlesworth gave what is probably the core of the problem:
I suspect you have produced a race condition in the opposite direction to what you anticipated. The child sent SIGUSR1 to the parent before the parent reached the pause().
ouah noted accurately:
An object shared between the signal handler and the non-handler code (your boolean objects) must have a volatile sig_atomic_t type otherwise the code is undefined.
That said, POSIX allows a little more laxity than standard C does for what can be done inside a signal handler. We might also note the C99 provides <stdbool.h> to define the bool type.
The original poster commented:
I don't see how can I make sure that the parent goes in the pause() call first without using sleep() in the child (which guarantees nothing). Any ideas?
Suggestion: Use usleep() (µ-sleep, or sleep in microseconds), or nanosleep() (sleep in nanoseconds)?
Or use a different synchronization mechanism, such as:
parent process creates FIFO;
fork();
child opens FIFO for writing (blocking until there is a reader);
parent opens FIFO for reading (blocking until there is a writer);
when unblocked because the open() calls return, both processes simply close the FIFO;
the parent removes the FIFO.
Note that there is no data communication between the two processes via the FIFO; the code is simply relying on the kernel to block the processes until there is a reader and a writer, so both processes are ready to go.
Another possibility, is that the parent process could try if (siguser1setted == FALSE) pause(); to reduce the window for the race condition. However, it only reduces the window; it does not guarantee that the race condition cannot occur. That is, Murphy's Law applies and the signal could arrive between the time the test is complete and the time the pause() is executed.
All of this goes to show that signals are not a very good IPC mechanism. They can be used for IPC, but they should seldom actually be used for synchronization.
Incidentally, there's no need to test the return value of any of the exec*() family of functions. If the system call returns, it failed.
And the questioner asked again:
Wouldn't it be better to use POSIX semaphores shared between processes?
Semaphores would certainly be another valid mechanism for synchronizing the two processes. Since I'd certainly have to look at the manual pages for semaphores whereas I can remember how to use FIFOs without looking, I'm not sure that I'd actually use them, but creating and removing a FIFO has its own set of issues so it is not clear that it is in any way 'better' (or 'worse'); just different. It's mkfifo(), open(), close(), unlink() for FIFOs versus sem_open() (or sem_init()), sem_post(), sem_wait(), sem_close(), and maybe sem_unlink() (or sem_destroy()) for semaphores. You might want to think about registering a 'FIFO removal' or 'semaphore cleanup' function with atexit() to make sure the FIFO or semaphore is destroyed under as many circumstances as possible. However, that's probably OTT for a test program.
I discovered an issue with thread implementation, that is strange to me. Maybe some of you can explain it to me, would be great.
I am working on something like a proxy, a program (running on different machines) that receives packets over eth0 and sends it through ath0 (wireless) to another machine which is doing the exactly same thing. Actually I am not at all sure what is causing my problem, that's because I am new to everything, linux and c programming.
I start two threads,
one is listening (socket) on eth0 for incoming packets and sends it out through ath0 (also socket)
and the other thread is listening on ath0 and sends through eth0.
If I use threads, I get an error like that:
sh-2.05b# ./socketex
Failed to send network header packet.
: Interrupted system call
If I use fork(), the program works as expected.
Can someone explain that behaviour to me?
Just to show the sender implementation here comes its code snippet:
while(keep_going) {
memset(&buffer[0], '\0', sizeof(buffer));
recvlen = recvfrom(sockfd_in, buffer, BUFLEN, 0, (struct sockaddr *) &incoming, &ilen);
if(recvlen < 0) {
perror("something went wrong / incoming\n");
exit(-1);
}
strcpy(msg, buffer);
buflen = strlen(msg);
sentlen = ath_sendto(sfd, &btpinfo, &addrnwh, &nwh, buflen, msg, &selpv2, &depv);
if(sentlen == E_ERR) {
perror("Failed to send network header packet.\n");
exit(-1);
}
}
UPDATE: my main file, starting either threads or processes (fork)
int main(void) {
port_config pConfig;
memset(&pConfig, 0, sizeof(pConfig));
pConfig.inPort = 2002;
pConfig.outPort = 2003;
pid_t retval = fork();
if(retval == 0) {
// child process
pc2wsuThread((void *) &pConfig);
} else if (retval < 0) {
perror("fork not successful\n");
} else {
// parent process
wsu2pcThread((void *) &pConfig);
}
/*
wint8 rc1, rc2 = 0;
pthread_t pc2wsu;
pthread_t wsu2pc;
rc1 = pthread_create(&pc2wsu, NULL, pc2wsuThread, (void *) &pConfig);
rc2 = pthread_create(&wsu2pc, NULL, wsu2pcThread, (void *) &pConfig);
if(rc1) {
printf("error: pthread_create() is %d\n", rc1);
return(-1);
}
if(rc2) {
printf("error: pthread_create() is %d\n", rc2);
return(-1);
}
pthread_join(pc2wsu, NULL);
pthread_join(wsu2pc, NULL);
*/
return 0;
}
Does it help?
update 05/30/2011
-sh-2.05b# ./wsuproxy 192.168.1.100
mgmtsrvc
mgmtsrvc
Failed to send network header packet.
: Interrupted system call
13.254158,75.165482,DATAAAAAAmgmtsrvc
mgmtsrvc
mgmtsrvc
Still get the interrupted system call, as you can see above.
I blocked all signals as followed:
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
The two threads are working on the same interfaces, but on different ports. The problem seems to appear still in the same place (please find it in the first code snippet). I can't go further and have not enough knowledge of how to solve that problem. Maybe some of you can help me here again.
Thanks in advance.
EINTR does not itself indicate an error. It means that your process received a signal while it was in the sendto syscall, and that syscall hadn't sent any data yet (that's important).
You could retry the send in this case, but a good thing would be to figure out what signal caused the interruption. If this is reproducible, try using strace.
If you're the one sending the signal, well, you know what to do :-)
Note that on linux, you can receive EINTR on sendto (and some other functions) even if you haven't installed a handler yourself. This can happen if:
the process is stopped (via SIGSTOP for example) and restarted (with SIGCONT)
you have set a send timeout on the socket (via SO_SNDTIMEO)
See the signal(7) man page (at the very bottom) for more details.
So if you're "suspending" your service (or something else is), that EINTR is expected and you should restart the call.
Keep in mind if you are using threads with signals that a given signal, when delivered to the process, could be delivered to any thread whose signal mask is not blocking the signal. That means if you have blocked incoming signals in one thread, and not in another, the non-blocking thread will receive the signal, and if there is no signal handler setup for the signal, you will end-up with the default behavior of that signal for the entire process (i.e., all the threads, both signal-blocking threads and non-signal-blocking threads). For instance, if the default behavior of a signal was to terminate a process, one thread catching that signal and executing it's default behavior will terminate the entire process, for all the threads, even though some threads may have been masking the signal. Also if you have two threads that are not blocking a signal, it is not deterministic which thread will handle the signal. Therefore it's typically the case that mixing signals and threads is not a good idea, but there are exceptions to the rule.
One thing you can try, is since the signal mask for a spawned thread is inherited from the generating thread, is to create a daemon thread for handling signals, where at the start of your program, you block all incoming signals (or at least all non-important signals), and then spawn your threads. Now those spawned threads will ignore any incoming signals in the parent-thread's blocked signal mask. If you need to handle some specific signals, you can still make those signals part of the blocked signal mask for the main process, and then spawn your threads. But when you're spawning the threads, leave one thread (could even be the main process thread after it's spawned all the worker threads) as a "daemon" thread waiting for those specific incoming (and now blocked) signals using sigwait(). That thread will then dispatch whatever functions are necessary when a given signal is received by the process. This will avoid signals from interrupting system calls in your other worker-threads, yet still allow you to handle signals.
The reason your forked version may not be having issues is because if a signal arrives at one parent process, it is not propagated to any child processes. So I would try, if you can, to see what signal it is that is terminating your system call, and in your threaded version, block that signal, and if you need to handle it, create a daemon-thread that will handle that signal's arrival, with the rest of the threads blocking that signal.
Finally, if you don't have access to any external libraries or debuggers, etc. to see what signals are arriving, you can setup a simple procedure for seeing what signals might be arriving. You can try this code:
#include <signal.h>
#include <stdio.h>
int main()
{
//block all incoming signals
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
//... spawn your threads here ...
//... now wait for signals to arrive and see what comes in ...
int arrived_signal;
while(1) //you can change this condition to whatever to exit the loop
{
sigwait(&signal_mask, &arrived_signal);
switch(arrived_signal)
{
case SIGABRT: fprintf(stderr, "SIGABRT signal arrived\n"); break;
case SIGALRM: fprintf(stderr, "SIGALRM signal arrived\n"); break;
//continue for the rest of the signals defined in signal.h ...
default: fprintf(stderr, "Unrecognized signal arrived\n");
}
}
//clean-up your threads and anything else needing clean-up
return 0;
}