Accept() and SIGUSR1 are being tripped accidentally - c

Hi I'm building a program that uses a signal handler shown below ...
struct sigaction pipeIn;
pipeIn.sa_handler = updateServer;
sigemptyset(&pipeIn.sa_mask);
pipeIn.sa_flags = SA_ONESHOT;
if(sigaction(SIGUSR1, &pipeIn, NULL) == -1){
printf("We have a problem, sigaction is not working.\n");
perror("\n");
exit(1);
}
The problem is that this handler is getting tripped when it's not supposed to. The only thing that should send the SIGUSR1 signal is my child process which exists inside an infinite while loop which listens for incoming connections. The child process is forked as you can see below. I redo the pipeIn handler to run a different function that the child process uses which the parent does not. The code is shown below.
while(1){
newSock = accept(listenSock,(struct sockaddr *)&their_addr,&addr_size);
printf("A\n");
if(!fork()){
// We want to redefine the interrupt
pid_t th;
th = getpid();
printf("child pid: %d\n",th);
pipeIn.sa_handler = setFlag;
if(sigaction(SIGUSR1, &pipeIn, NULL) == -1){
printf("We have a problem, sigaction is not working.\n");
perror("\n");
exit(1);
}
close(listenSock);
kill(getppid(),SIGUSR1);
waitForP();
}*/
close(newSock);
exit(0);
}
close(newSock);
//waitForP();
//break;
}
When I run this code, I will make a call from another computer to connect to my server program that you see here. It will accept() the one request from that computer just fine, but then the child process will eventually get around to sending SIGUSR1 to the parent process. However the parent process receives the SIGUSR1 signal before the child process even gets to send the signal. The handler trips the function before it should ... then the child process finally gets to kill the signal and the handler goes off the 2nd time. Lastly the accept() function goes off again even if there are no new connections being produced and the incoming ip address is from a weird ipv6 address that is random. I don't know what's going on. Any help would be great.

Repeat after me: always check for error returned from a system call (which accept(2) is - you are getting -1 instead of a socket descriptor, EINTR in errno and undefined connecting address).

This might seem obvious, but have you compiled your code with full warnings on and assured you don't have any? Mysterious behavior is often caused by errors that are only mentioned by the C compiler if you ask it...

Related

Parent process in server stuck in accept call. How do I terminate the parent cleanly from child

Hello from a beginner C programmer.
I have a simple server client setup. I only want one client to connect to the server, but I want other clients to be able to try and get a message that the server is occupied.
I am able to connect to the server with one client, and let other clients trying to connect know there is no room. My problem occurs when the client tells the server to shut down. The child process is able to break out of the loops and terminate. The parent, however, is not able to receive the message from the child using pipe, because it is stuck on accept.
I could use kill(2) to end the parent, but will I get a clean termination with closing of sockets and files then?
I have also tried to let the parent not stop at accept using fcntl(sock_desc, F_SETFL, fcntl(sock_desc, F_GETFL, 0) | O_NONBLOCK); but this opens up new problems.
I want to somehow make the child tell the parent to skip the accept line and continue so that it gets the pipe message and exits the loop.
If this is a bad way to terminate servers I would appreciate to learn about that.
Simplified server code:
void termination_handler (int signum)
{
if(signum == SIGTERM){
//Is this where the accept call is changed?
}
}
void main(){
struct sigaction sa = {0}; //2b) Initialise the struct sigaction variable to all 0s beforehand.
sa.handler = termination_handler; //2a) Set the member for handler (to the signal-handler function)
sigaction(SIGTERM, &sa, NULL);
pid_t pid;
int loop = 1;
while(loop){
int sock = accept(net_sock, NULL, NULL); //After connection
//parent is stuck here
if(kill(pid,0) == -1){
pid = fork();
}
else{
//Tell second client there is no room and close socket
}
//Child
if(pid == 0){
while(loop){
//Read signal in from client to end child loop and terminate child
//Write with pipe to parent to end parent loop and terminate parent
kill(getppid(), SIGTERM) // Is this how I do it?
}
}
//Parent
else{
close(sock);
//Read with pipe from child to end loop
//After first connection, the parent won't get this message
}
}
The OS will close filedescriptors for you. Unless you have other cleanup work to do (such as writing into files or removing some files), a kill with an unhandled terminating signal (e.g., SIGTERM or SIGINT) should be sufficient.
If you do have other cleanup work to do, have the child signal the parent with a signal for which the parent has a signal handler established (you need to establish the handler with sigaction). That will break accept with return code -1 and errno == EINTR, allowing you to do whatever you need to do.
volatile sig_atomic_t usr1 = 0;
void usr1_handler(int Sig) { usr1 = 1; }
//...
int main() { //...
sigaction(SIGUSR1, &(struct sigaction){.sa_handler=usr1_handler},0);
//...
usr1 = 0;
sock = accept( /*... */ );
if ( -1 == sock && EINTR == errno && usr1 ) //was interrupted by USR1
/* cleanup and exit */;
Let the child signal it's parent before ending. If done correctly accept() returns on signal reception, returning -1 and setting errno to EINTR.
From accept()'s documentation:
RETURN VALUE
Upon successful completion, accept() shall return the non-negative file descriptor of the accepted socket. Otherwise, -1 shall be returned, errno shall be set to indicate the error, [...]
[...]
ERRORS
The accept() function shall fail if:
[...]
[EINTR]
The accept() function was interrupted by a signal that was caught before a valid connection arrived.

C: fork() inform parent when child process disconnects

I am doing a simple server/client program in C which listens on a network interface and accepts clients. Each client is handled in a forked process.
The goal I have is to let the parent process know, once a client has disconnected from the child process.
Currently my main loop looks like this:
for (;;) {
/* 1. [network] Wait for new connection... (BLOCKING CALL) */
fd_listen[client] = accept(fd_listen[server], (struct sockaddr *)&cli_addr, &clilen);
if (fd_listen[client] < 0) {
perror("ERROR on accept");
exit(1);
}
/* 2. [process] Call socketpair */
if ( socketpair(AF_LOCAL, SOCK_STREAM, 0, fd_comm) != 0 ) {
perror("ERROR on socketpair");
exit(1);
}
/* 3. [process] Call fork */
pid = fork();
if (pid < 0) {
perror("ERROR on fork");
exit(1);
}
/* 3.1 [process] Inside the Child */
if (pid == 0) {
printf("[child] num of clients: %d\n", num_client+1);
printf("[child] pid: %ld\n", (long) getpid());
close(fd_comm[parent]); // Close the parent socket file descriptor
close(fd_listen[server]); // Close the server socket file descriptor
// Tasks that the child process should be doing for the connected client
child_processing(fd_listen[client]);
exit(0);
}
/* 3.2 [process] Inside the Parent */
else {
num_client++;
close(fd_comm[child]); // Close the child socket file descriptor
close(fd_listen[client]); // Close the client socket file descriptor
printf("[parent] num of clients: %d\n", num_client);
while ( (w = waitpid(-1, &status, WNOHANG)) > 0) {
printf("[EXIT] child %d terminated\n", w);
num_client--;
}
}
}/* end of while */
It all works well, the only problem I have is (probably) due to the blocking accept call.
When I connect to the above server, a new child process is created and child_processing is called.
However when I disconnect with that client, the main parent process does not know about it and does NOT output printf("[EXIT] child %d terminated\n", w);
But, when I connect with a second client after the first client has disconnected, the main loop is able to finally process the while ( (w = waitpid(-1, &status, WNOHANG)) > 0) part and tell me that the first client has disconnected.
If there will be only ever one client connecting and disconnecting afterwards, my main parent process will never be able to tell if it has disconnected or not.
Is there any way to tell the parent process that my client already left?
UPDATE
As I am a real beginner with c, it would be nice if you provide some short snippets to your answer so I can actually understand it :-)
Your waitpid usage is not correct. You have a non-blocking call so if the child is not finished then then the call gets 0:
waitpid(): on success, returns the process ID of the child whose state
has changed; if WNOHANG was specified and one or more child(ren)
specified by pid exist, but have not yet changed state, then 0 is
returned. On error, -1 is returned.
So your are going immediately out of the while loop. Of course this can be catched later when the first children terminates and a second one lets you process the waitpid again.
As you need to have a non-blocking call to wait I can suggest you not to manage termination directly but through SIGCHLD signal that will let you catch termination of any children and then appropriately call waitpid in the handler:
void handler(int signal) {
while (waitpid(...)) { // find an adequate condition and paramters for your needs
}
...
struct sigaction act;
act.sa_flag = 0;
sigemptyset(&(act.sa_mask));
act.sa_handler = handler;
sigaction(SIGCHLD,&act,NULL);
... // now ready to receive SIGCHLD when at least a children changes its state
If I understand correctly, you want to be able to servicve multiple clients at once, and therefore your waitpid call is correct in that it does not block if no child has terminated.
However, the problem you then have is that you need to be able to process asynchronous child termination while waiting for new clients via accept.
Assuming that you're dealing with a POSIXy system, merely having a SIGCHLD handler established and having the signal unmasked (via sigprocmask, though IIRC it is unmasked by default), should be enough to cause accept to fail with EINTR if a child terminates while you are waiting for a new client to connect - and you can then handle EINTR appropriately.
The reason for this is that a SIGCHLD signal will be automatically sent to the parent process when a child process terminates. In general, system calls such as accept will return an error of EINTR ("interrupted") if a signal is received while they are waiting.
However, there would still be a race condition, where a child terminates just before you call accept (i.e. in between where already have waitpid and accept). There are two main possibilities to overcome this:
Do all the child termination processing in your SIGCHLD handler, instead of the main loop. This may not be feasible, however, since there are significant limits to what you are allowed to do within a signal handler. You may not call printf for example (though you may use write).
I do not suggest you go down this path, although it may seem simpler at first it is the least flexible option and may prove unworkable later.
Write to one end of a non-blocking pipe in your SIGCHLD signal handler. Within the main loop, instead of calling accept directly, use poll (or select) to look for readiness on both the socket and the read end of the pipe, and handle each appropriately.
On Linux (and OpenBSD, I'm not sure about others) you can use ppoll (man page) to avoid the need to create a pipe (and in this case you should leave the signal masked, and have it unmasked during the poll operation; if ppoll fails with EINTR, you know that a signal was received, and you should call waitpid). You still need to set a signal handler for SIGCHLD, but it doesn't need to do anything.
Another option on Linux is to use signalfd (man page) to avoid both the need to create a pipe and set up a signal handler (I think). You should mask the SIGCHLD signal (using sigprocmask) if you use this. When poll (or equivalent) indicates that the signalfd is active, read the signal data from it (which clears the signal) and then call waitpid to reap the child.
On various BSD systems you can use kqueue (OpenBSD man page) instead of poll and watch for signals without needing to establish a signal handler.
On other POSIX systems you may be able to use pselect (documentation) in a similar way to ppoll as described above.
There is also the option of using a library such as libevent to abstract away the OS-specifics.
The Glibc manual has an example of using select. Consult the manual pages for poll, ppoll, pselect for more information about those functions. There is an online book on using Libevent.
Rough example for using select, borrowed from Glibc documentation (and modified):
/* Set up a pipe and set signal handler for SIGCHLD */
int pipefd[2]; /* must be a global variable */
pipe(pipefd); /* TODO check for error return */
fcntl(pipefd[1], F_SETFL, O_NONBLOCK); /* set write end non-blocking */
/* signal handler */
void sigchld_handler(int signum)
{
char a = 0; /* write anything, doesn't matter what */
write(pipefd[1], &a, 1);
}
/* set up signal handler */
signal(SIGCHLD, sigchld_handler);
Where you currently have accept, you need to check status of the server socket and the read end of the pipe:
fd_set set, outset;
struct timeval timeout;
/* Initialize the file descriptor set. */
FD_ZERO (&set);
FD_SET (fdlisten[server], &set);
FD_SET (pipefds[0], &set);
FD_ZERO(&outset);
for (;;) {
select (FD_SETSIZE, &set, NULL, &outset, NULL /* no timeout */));
/* TODO check for error return.
EINTR should just continue the loop. */
if (FD_ISSET(fdlisten[server], &outset)) {
/* now do accept() etc */
}
if (FD_ISSET(pipefds[0], &outset)) {
/* now do waitpid(), and read a byte from the pipe */
}
}
Using other mechanisms is generally simpler, so I leave those as an exercise :)

C code workks in Mac (Darwin 13.4), but not in Linux (2.6.32)

I'm writing a toy shell program for a class and I did all of my coding on my mac (Darwin 13.4.0) and compiled using gcc <programname> -o <executablename>. Everything seems to run perfectly. Then I ftp the source code over to the school's Linux server and compile again, using the exact same compilation instruction, but on the Linux machine the code is buggy. In particular, the sigaction (signal handler) doesn't seem to be working properly all the time. It seems as though it isn't reliably catching the SIGCHLD signal. Edit--Actually, what was happening was the variable I was storing the status in was getting clobbered, so the incorrect status was displayed for foreground processes.
Anyone have any ideas why the change in OS might cause this kind of problem?
Here's what my signal handler code looks like:
void handleSignal(int signal){
int childid = 0;
int tempStatus = 0;
while ( (childid = waitpid(-1, &childStatus, WNOHANG)) > 0) {
/*Parse the exit status */
if(WIFEXITED(childStatus)){
childStatus = WEXITSTATUS(childStatus);
}
switch (signal) {
/*if the signal came from a child */
case SIGCHLD:
/*for background processes alert user */
if (childid != foregroundProcess){
printf("pid %i terminated:",childid);
showStatus(childStatus);
fflush(stdout);
}
/* for foreground children ending, just set the temp Status, in case*/
/* background children also need to be caught */
else {
tempStatus = childStatus;
}
break;
case SIGINT:
/*If there is a foreground child, send signal to it, else ignore. */
if (foregroundProcess){
kill(foregroundProcess, signal);
}
break;
default:
printf("Some other signal was received: code %i\n", signal);
fflush(stdout);
}
}
childStatus = tempStatus; /* reset child status to foreground status */
}
Edit: Adding the code that registers the signal handler:
struct sigaction sa;
sa.sa_handler = &handleSignal; /*passing function ref. to handler */
sa.sa_flags = SA_RESTART; /* restart the shell signal handler */
sigfillset(&sa.sa_mask); /*block all other signals while handling sigs */
sigaction(SIGINT, &sa, NULL);
sigaction(SIGCHLD, &sa, NULL);
sigaction(SIGTERM, &sa, NULL);
Disregard. Solved it. Here's what the problem was: I was waiting on the foreground child process in the main program loop, and I was also waiting on it in the signal handler. In the main loop I was using waitoptions = 0, so the program would wait, and in the handler WNOHANG.
So why did it behave differently in Linux versus Darwin? Here's my theory: On the Darwin box, when the child died, the SIGCHLD signal was being delivered and handled before the waitpid in the parent's main loop caught that the child had died. So the signal handler handled the dying foreground child. Then, when execution returned to the waitpid command in the main loop, there was no child with that pid, and the waitpid function returned with an error of -1. However, I wasn't checking the return value of that waitpid call, which meant that the error was silent to me, and the program operated correctly on the Darwin machine.
In the Linux machine, however, the waitpid in the main program loop executed first before the signal handler. So childstatus would be set (correctly) in the main program loop. Then, when it came time for the signal handler to catch the SIGCHLD signal, the process had already been waited on. So waitpid returned error (-1, errno 10). But because of how I handled the child status variable in the signal handler, (with childStatus and tempStatus) this situation would clobber my child status, resetting it back to 0. So every foreground child showed an exit status of 0 when run on the Linux machine, but an appropriate exit status on the Mac. The solution? Change the tempStatus declaration to int tempStatus = childStatus. That way, if the childStatus has already be set by a foreground process, the entire loop is skipped and the status persists. If the signal handler is called on behalf of a background process, however, the signal handler saves the foreground status, if it exists, and displays the background status as it handles the background, and then resets the foreground status.
Ugly... but it will work for well enough for a grade in a couple of hours.
I don't know if any of this will ever help anyone else, but hey, it might. Talk about frustrating!

Problems with signal handling

After a fork call, i have one father that must send sigusr1 or sigusr2 (based on the value of the 'cod' variable) to his child. The child have to install the proper handlers before receiving sigusr1 or sigusr2. For doing so, i pause the father waiting for the child to signal him telling that he's done with the handler installation. The father is signaled by sigusr1 and the handler for this signal is installed before the fork call. However, it seems the father can't return from pause making me think that he actually never call the sigusr1 handler.
[...]
typedef enum{FALSE, TRUE} boolean;
boolean sigusr1setted = FALSE;
boolean sigusr2setted = FALSE;
void
sigusr1_handler0(int signo){
return;
}
void
sigusr1_handler(int signo){
sigusr1setted = TRUE;
}
void
sigusr2_handler(int signo){
sigusr2setted = TRUE;
}
int main(int argc, char *argv[]){
[...]
if(signal(SIGUSR1, sigusr1_handler0) == SIG_ERR){
perror("signal 0 error");
exit(EXIT_FAILURE);
}
pid = fork();
if (pid == 0){
if(signal(SIGUSR1, sigusr1_handler) == SIG_ERR){
perror("signal 1 error");
exit(EXIT_FAILURE);
}
if(signal(SIGUSR2, sigusr2_handler) == SIG_ERR){
perror("signal 2 error");
exit(EXIT_FAILURE);
}
kill(SIGUSR1, getppid()); // wake up parent by signaling him with sigusr1
// Wait for the parent to send the signals...
pause();
if(sigusr1setted){
if(execl("Prog1", "Prog1", (char*)0) < 0){
perror("exec P1 error");
exit(EXIT_FAILURE);
}
}
if(sigusr2setted){
if(execl("Prog2", "Prog2", (char*)0) < 0){
perror("exec P2 error");
exit(EXIT_FAILURE);
}
}
// Should'nt reach this point : something went wrong...
exit(EXIT_FAILURE);
}else if (pid > 0){
// The father must wake only after the child has done with the handlers installation
pause();
// Never reaches this point ...
if (cod == 1)
kill(SIGUSR1, pid);
else
kill(SIGUSR2, pid);
// Wait for the child to complete..
if(wait(NULL) == -1){
perror("wait 2 error");
exit(EXIT_FAILURE);
}
[...]
}else{
perror("fork 2 error");
exit(EXIT_FAILURE);
}
[...]
exit(EXIT_SUCCESS);
}
Assembling a plausible answer from the comments - so this is Community Wiki from the outset. (If Oli provides an answer, up-vote that instead of this!)
Oli Charlesworth gave what is probably the core of the problem:
I suspect you have produced a race condition in the opposite direction to what you anticipated. The child sent SIGUSR1 to the parent before the parent reached the pause().
ouah noted accurately:
An object shared between the signal handler and the non-handler code (your boolean objects) must have a volatile sig_atomic_t type otherwise the code is undefined.
That said, POSIX allows a little more laxity than standard C does for what can be done inside a signal handler. We might also note the C99 provides <stdbool.h> to define the bool type.
The original poster commented:
I don't see how can I make sure that the parent goes in the pause() call first without using sleep() in the child (which guarantees nothing). Any ideas?
Suggestion: Use usleep() (ยต-sleep, or sleep in microseconds), or nanosleep() (sleep in nanoseconds)?
Or use a different synchronization mechanism, such as:
parent process creates FIFO;
fork();
child opens FIFO for writing (blocking until there is a reader);
parent opens FIFO for reading (blocking until there is a writer);
when unblocked because the open() calls return, both processes simply close the FIFO;
the parent removes the FIFO.
Note that there is no data communication between the two processes via the FIFO; the code is simply relying on the kernel to block the processes until there is a reader and a writer, so both processes are ready to go.
Another possibility, is that the parent process could try if (siguser1setted == FALSE) pause(); to reduce the window for the race condition. However, it only reduces the window; it does not guarantee that the race condition cannot occur. That is, Murphy's Law applies and the signal could arrive between the time the test is complete and the time the pause() is executed.
All of this goes to show that signals are not a very good IPC mechanism. They can be used for IPC, but they should seldom actually be used for synchronization.
Incidentally, there's no need to test the return value of any of the exec*() family of functions. If the system call returns, it failed.
And the questioner asked again:
Wouldn't it be better to use POSIX semaphores shared between processes?
Semaphores would certainly be another valid mechanism for synchronizing the two processes. Since I'd certainly have to look at the manual pages for semaphores whereas I can remember how to use FIFOs without looking, I'm not sure that I'd actually use them, but creating and removing a FIFO has its own set of issues so it is not clear that it is in any way 'better' (or 'worse'); just different. It's mkfifo(), open(), close(), unlink() for FIFOs versus sem_open() (or sem_init()), sem_post(), sem_wait(), sem_close(), and maybe sem_unlink() (or sem_destroy()) for semaphores. You might want to think about registering a 'FIFO removal' or 'semaphore cleanup' function with atexit() to make sure the FIFO or semaphore is destroyed under as many circumstances as possible. However, that's probably OTT for a test program.

c / interrupted system call / fork vs. thread

I discovered an issue with thread implementation, that is strange to me. Maybe some of you can explain it to me, would be great.
I am working on something like a proxy, a program (running on different machines) that receives packets over eth0 and sends it through ath0 (wireless) to another machine which is doing the exactly same thing. Actually I am not at all sure what is causing my problem, that's because I am new to everything, linux and c programming.
I start two threads,
one is listening (socket) on eth0 for incoming packets and sends it out through ath0 (also socket)
and the other thread is listening on ath0 and sends through eth0.
If I use threads, I get an error like that:
sh-2.05b# ./socketex
Failed to send network header packet.
: Interrupted system call
If I use fork(), the program works as expected.
Can someone explain that behaviour to me?
Just to show the sender implementation here comes its code snippet:
while(keep_going) {
memset(&buffer[0], '\0', sizeof(buffer));
recvlen = recvfrom(sockfd_in, buffer, BUFLEN, 0, (struct sockaddr *) &incoming, &ilen);
if(recvlen < 0) {
perror("something went wrong / incoming\n");
exit(-1);
}
strcpy(msg, buffer);
buflen = strlen(msg);
sentlen = ath_sendto(sfd, &btpinfo, &addrnwh, &nwh, buflen, msg, &selpv2, &depv);
if(sentlen == E_ERR) {
perror("Failed to send network header packet.\n");
exit(-1);
}
}
UPDATE: my main file, starting either threads or processes (fork)
int main(void) {
port_config pConfig;
memset(&pConfig, 0, sizeof(pConfig));
pConfig.inPort = 2002;
pConfig.outPort = 2003;
pid_t retval = fork();
if(retval == 0) {
// child process
pc2wsuThread((void *) &pConfig);
} else if (retval < 0) {
perror("fork not successful\n");
} else {
// parent process
wsu2pcThread((void *) &pConfig);
}
/*
wint8 rc1, rc2 = 0;
pthread_t pc2wsu;
pthread_t wsu2pc;
rc1 = pthread_create(&pc2wsu, NULL, pc2wsuThread, (void *) &pConfig);
rc2 = pthread_create(&wsu2pc, NULL, wsu2pcThread, (void *) &pConfig);
if(rc1) {
printf("error: pthread_create() is %d\n", rc1);
return(-1);
}
if(rc2) {
printf("error: pthread_create() is %d\n", rc2);
return(-1);
}
pthread_join(pc2wsu, NULL);
pthread_join(wsu2pc, NULL);
*/
return 0;
}
Does it help?
update 05/30/2011
-sh-2.05b# ./wsuproxy 192.168.1.100
mgmtsrvc
mgmtsrvc
Failed to send network header packet.
: Interrupted system call
13.254158,75.165482,DATAAAAAAmgmtsrvc
mgmtsrvc
mgmtsrvc
Still get the interrupted system call, as you can see above.
I blocked all signals as followed:
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
The two threads are working on the same interfaces, but on different ports. The problem seems to appear still in the same place (please find it in the first code snippet). I can't go further and have not enough knowledge of how to solve that problem. Maybe some of you can help me here again.
Thanks in advance.
EINTR does not itself indicate an error. It means that your process received a signal while it was in the sendto syscall, and that syscall hadn't sent any data yet (that's important).
You could retry the send in this case, but a good thing would be to figure out what signal caused the interruption. If this is reproducible, try using strace.
If you're the one sending the signal, well, you know what to do :-)
Note that on linux, you can receive EINTR on sendto (and some other functions) even if you haven't installed a handler yourself. This can happen if:
the process is stopped (via SIGSTOP for example) and restarted (with SIGCONT)
you have set a send timeout on the socket (via SO_SNDTIMEO)
See the signal(7) man page (at the very bottom) for more details.
So if you're "suspending" your service (or something else is), that EINTR is expected and you should restart the call.
Keep in mind if you are using threads with signals that a given signal, when delivered to the process, could be delivered to any thread whose signal mask is not blocking the signal. That means if you have blocked incoming signals in one thread, and not in another, the non-blocking thread will receive the signal, and if there is no signal handler setup for the signal, you will end-up with the default behavior of that signal for the entire process (i.e., all the threads, both signal-blocking threads and non-signal-blocking threads). For instance, if the default behavior of a signal was to terminate a process, one thread catching that signal and executing it's default behavior will terminate the entire process, for all the threads, even though some threads may have been masking the signal. Also if you have two threads that are not blocking a signal, it is not deterministic which thread will handle the signal. Therefore it's typically the case that mixing signals and threads is not a good idea, but there are exceptions to the rule.
One thing you can try, is since the signal mask for a spawned thread is inherited from the generating thread, is to create a daemon thread for handling signals, where at the start of your program, you block all incoming signals (or at least all non-important signals), and then spawn your threads. Now those spawned threads will ignore any incoming signals in the parent-thread's blocked signal mask. If you need to handle some specific signals, you can still make those signals part of the blocked signal mask for the main process, and then spawn your threads. But when you're spawning the threads, leave one thread (could even be the main process thread after it's spawned all the worker threads) as a "daemon" thread waiting for those specific incoming (and now blocked) signals using sigwait(). That thread will then dispatch whatever functions are necessary when a given signal is received by the process. This will avoid signals from interrupting system calls in your other worker-threads, yet still allow you to handle signals.
The reason your forked version may not be having issues is because if a signal arrives at one parent process, it is not propagated to any child processes. So I would try, if you can, to see what signal it is that is terminating your system call, and in your threaded version, block that signal, and if you need to handle it, create a daemon-thread that will handle that signal's arrival, with the rest of the threads blocking that signal.
Finally, if you don't have access to any external libraries or debuggers, etc. to see what signals are arriving, you can setup a simple procedure for seeing what signals might be arriving. You can try this code:
#include <signal.h>
#include <stdio.h>
int main()
{
//block all incoming signals
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
//... spawn your threads here ...
//... now wait for signals to arrive and see what comes in ...
int arrived_signal;
while(1) //you can change this condition to whatever to exit the loop
{
sigwait(&signal_mask, &arrived_signal);
switch(arrived_signal)
{
case SIGABRT: fprintf(stderr, "SIGABRT signal arrived\n"); break;
case SIGALRM: fprintf(stderr, "SIGALRM signal arrived\n"); break;
//continue for the rest of the signals defined in signal.h ...
default: fprintf(stderr, "Unrecognized signal arrived\n");
}
}
//clean-up your threads and anything else needing clean-up
return 0;
}

Resources