Handle signals with epoll_wait and signalfd

Handle signals with epoll_wait and signalfd - c

I'm writing my own echo server using sockets and syscalls. I am using epoll to work with many different clients at the same time and all the operations done with clients are nonblocking. When the server is on and doing nothing, it is in epoll_wait. Now I want to add the possibility to shut the server down using signals. For example, I start the server in bash terminal, then I press ctrl-c and the server somehow handles SIGINT. My plan is to use signalfd. I create new signalfd and add it to epoll instance with the following code:
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGINT);
signal_fd = signalfd(-1, &mask, 0);
epoll_event event;
event.data.fd = signal_fd;
event.events = EPOLLIN;
epoll_ctl(fd, EPOLL_CTL_ADD, signal_fd, &event);
Then I expect, that when epoll is waiting and I press ctrl-c, event on epoll happens, it wakes up and then I handle the signal with the following code:
if (events[i].data.fd == signal_fd)
{
//do something
exit(0);
}
Though in reality the server just stops without handling the signal. What am I doing wrong, what is the correct way to solve my problem? And if I'm not understanding signals correctly, what is the place, where the one should use signalfd?

epoll_wait returns -1 and errno == EINTR when it is interrupted by a signal. In this case you need to read from signal_fd.
Set the signal handler for your signals to SIG_IGN, otherwise signals may terminate your application.
See man signal 7:
The following interfaces are never restarted after being interrupted by
a signal handler, regardless of the use of SA_RESTART; they always fail
with the error EINTR when interrupted by a signal handler:
File descriptor multiplexing interfaces: epoll_wait(2),
epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2).

Though in reality the server just stops without handling the signal. What am I doing wrong, what is the correct way to solve my problem? And if I'm not understanding signals correctly, what is the place, where one should use signalfd?
Signal handlers are per process. You left the signal handler at the default, which is to terminate the processes.
So you need to add something like this,
struct sigaction action;
std::memset(&action, 0, sizeof(struct sigaction));
action.sa_handler = your_handler;
sigaction(signum, &action, NULL);
for each signum that you want your application to receive interrupts for. Also handle the return value of sigaction. My experience is that if you use SIG_IGN as handler than you still interrupt a system call like epoll_pwait from the "outside", but it won't work when you try to wake up the thread from the program itself by sending the signal directly to that thread using pthread_kill.
Next you need to mask all signals from every thread, so that by default no thread will receive it (otherwise a random thread is woken up to handle the signal). The easiest way to do that is by doing it in main before creating any thread.
For example,
sigset_t all_signals;
sigemptyset(&all_signals);
sigaddset(&all_signals, signum); // Repeat for each signum that you use.
sigprocmask(SIG_BLOCK, &all_signals, NULL);
And then unblock the signals per thread when you want that thread to receive the signal.
If you use signalfd, then you do not want to unblock them - that system call unblocks the signals itself, just pass the appropriate mask (set bits for signalfd (it uses the passed mask to unblock). See also the man page of signalfd).
epoll_pwait works differently; like pselect you unblock the signal that you are interested in. You set a handler for that signal (see above) that sets a flag. Then just before calling epoll_pwait you block the signal, then test the flag and handle it, and then call epoll_pwait without first unblocking the signal. After epoll_wait returns you can unblock the signal again so that your handler can be called again.

You have to block all the signals you want to handle with your signal-FD before you create that signal-FD. Otherwise, those signals still interrupt blocked system calls such as epoll_wait() - as you observed.
See also the signalfd(2) man page:
Normally, the set of signals to be received via the file descriptor
should be blocked using sigprocmask(2), to prevent the signals being
handled according to their default dispositions.
Thus, you have to change your example like this:
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGINT);
int r = sigprocmask(SIG_BLOCK, &mask, 0);
if (r == -1) {
// XXX handle errors
}
signal_fd = signalfd(-1, &mask, 0);
if (signal_fd == -1) {
// XXX handle errors
}
epoll_event event;
event.data.fd = signal_fd;
event.events = EPOLLIN;
r = epoll_ctl(fd, EPOLL_CTL_ADD, signal_fd, &event);
if (r == -1) {
// XXX handle errors
}

Related

master error when multiple signal are sent

I got this issue:
I made a program in c, where the main process creates some child process, and these, after a while, are able to send a signal to the main process:
the signal is sent with this code:
kill(getppid(), SIGUSR1);
and the main process, in the while loop is waiting the SIGUSR1 message...
everything is fine, but if I increase the child number and automatically the possibility to have more signals in the same time, the program crash printing the message:
User defined signal 1
the main code is like this:
void signalHandler(int sig, siginfo_t* info, void* vp) {
if (sig == SIGUSR1) {
printf("SIGUSR1 has arrived\n");
} else if (sig == SIGUSR2) {
printf("SIGUSR2 has arrived\n");
}
}
int main(int argc, char const *argv[]) {
struct sigaction action, old_action;
memset(&action, 0, sizeof(struct sigaction));
action.sa_sigaction = signalHandler;
sigemptyset(&action.sa_mask);
action.sa_flags = SA_RESTART | SA_NODEFER;
while (1) {
sigaction(SIGUSR1, &action, &old_action);
sigaction(SIGUSR2, &action, &old_action);
}
}
I think the problem is that the signal is sent when the master is still working on the previous signal...but how can I do to fix this thing
thank you very much

It means that the child is sending the signal before the parent process was able to call sigaction() to configure the signal handler. When this happens, the default signal reaction to SIGUSR1 terminates the program:
SIGUSR1 P1990 Term User-defined signal 1
https://man7.org/linux/man-pages/man7/signal.7.html
However, there are many problems with your code. printf() is not safe to be called inside a signal handler (it's AS-Unsafe as defined by POSIX):
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/V2_chap02.html#tag_15_04_03
Also, using SA_NODEFER may create nested signals (another signal handler is called while some signal handler is running) but your program does not protect against a flood. Given enough children this will generate a stack overflow. Finally, the main program keeps running a non-stop infinite loop reconfiguring the signals, while it should have configured them only once outside the loop and blocked inside the loop (for example sigwait() or pselect()):
https://man7.org/linux/man-pages/man2/select.2.html
Finally, if you expect to run a large number of children that might flood the parent with signals, then it would be better to use the real time signal generation function (sigqueue()) rather than kill(). The difference is that with sigqueue(), all signals are queued and SA_NODEFER is not necessary to avoid discarding signals while some other signal handler is running:
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/V2_chap02.html#tag_15_04_02
Final conclusion: the code should be completely rewritten.

C: fork() inform parent when child process disconnects

I am doing a simple server/client program in C which listens on a network interface and accepts clients. Each client is handled in a forked process.
The goal I have is to let the parent process know, once a client has disconnected from the child process.
Currently my main loop looks like this:
for (;;) {
/* 1. [network] Wait for new connection... (BLOCKING CALL) */
fd_listen[client] = accept(fd_listen[server], (struct sockaddr *)&cli_addr, &clilen);
if (fd_listen[client] < 0) {
perror("ERROR on accept");
exit(1);
}
/* 2. [process] Call socketpair */
if ( socketpair(AF_LOCAL, SOCK_STREAM, 0, fd_comm) != 0 ) {
perror("ERROR on socketpair");
exit(1);
}
/* 3. [process] Call fork */
pid = fork();
if (pid < 0) {
perror("ERROR on fork");
exit(1);
}
/* 3.1 [process] Inside the Child */
if (pid == 0) {
printf("[child] num of clients: %d\n", num_client+1);
printf("[child] pid: %ld\n", (long) getpid());
close(fd_comm[parent]); // Close the parent socket file descriptor
close(fd_listen[server]); // Close the server socket file descriptor
// Tasks that the child process should be doing for the connected client
child_processing(fd_listen[client]);
exit(0);
}
/* 3.2 [process] Inside the Parent */
else {
num_client++;
close(fd_comm[child]); // Close the child socket file descriptor
close(fd_listen[client]); // Close the client socket file descriptor
printf("[parent] num of clients: %d\n", num_client);
while ( (w = waitpid(-1, &status, WNOHANG)) > 0) {
printf("[EXIT] child %d terminated\n", w);
num_client--;
}
}
}/* end of while */
It all works well, the only problem I have is (probably) due to the blocking accept call.
When I connect to the above server, a new child process is created and child_processing is called.
However when I disconnect with that client, the main parent process does not know about it and does NOT output printf("[EXIT] child %d terminated\n", w);
But, when I connect with a second client after the first client has disconnected, the main loop is able to finally process the while ( (w = waitpid(-1, &status, WNOHANG)) > 0) part and tell me that the first client has disconnected.
If there will be only ever one client connecting and disconnecting afterwards, my main parent process will never be able to tell if it has disconnected or not.
Is there any way to tell the parent process that my client already left?
UPDATE
As I am a real beginner with c, it would be nice if you provide some short snippets to your answer so I can actually understand it :-)

Your waitpid usage is not correct. You have a non-blocking call so if the child is not finished then then the call gets 0:
waitpid(): on success, returns the process ID of the child whose state
has changed; if WNOHANG was specified and one or more child(ren)
specified by pid exist, but have not yet changed state, then 0 is
returned. On error, -1 is returned.
So your are going immediately out of the while loop. Of course this can be catched later when the first children terminates and a second one lets you process the waitpid again.
As you need to have a non-blocking call to wait I can suggest you not to manage termination directly but through SIGCHLD signal that will let you catch termination of any children and then appropriately call waitpid in the handler:
void handler(int signal) {
while (waitpid(...)) { // find an adequate condition and paramters for your needs
}
...
struct sigaction act;
act.sa_flag = 0;
sigemptyset(&(act.sa_mask));
act.sa_handler = handler;
sigaction(SIGCHLD,&act,NULL);
... // now ready to receive SIGCHLD when at least a children changes its state

If I understand correctly, you want to be able to servicve multiple clients at once, and therefore your waitpid call is correct in that it does not block if no child has terminated.
However, the problem you then have is that you need to be able to process asynchronous child termination while waiting for new clients via accept.
Assuming that you're dealing with a POSIXy system, merely having a SIGCHLD handler established and having the signal unmasked (via sigprocmask, though IIRC it is unmasked by default), should be enough to cause accept to fail with EINTR if a child terminates while you are waiting for a new client to connect - and you can then handle EINTR appropriately.
The reason for this is that a SIGCHLD signal will be automatically sent to the parent process when a child process terminates. In general, system calls such as accept will return an error of EINTR ("interrupted") if a signal is received while they are waiting.
However, there would still be a race condition, where a child terminates just before you call accept (i.e. in between where already have waitpid and accept). There are two main possibilities to overcome this:
Do all the child termination processing in your SIGCHLD handler, instead of the main loop. This may not be feasible, however, since there are significant limits to what you are allowed to do within a signal handler. You may not call printf for example (though you may use write).
I do not suggest you go down this path, although it may seem simpler at first it is the least flexible option and may prove unworkable later.
Write to one end of a non-blocking pipe in your SIGCHLD signal handler. Within the main loop, instead of calling accept directly, use poll (or select) to look for readiness on both the socket and the read end of the pipe, and handle each appropriately.
On Linux (and OpenBSD, I'm not sure about others) you can use ppoll (man page) to avoid the need to create a pipe (and in this case you should leave the signal masked, and have it unmasked during the poll operation; if ppoll fails with EINTR, you know that a signal was received, and you should call waitpid). You still need to set a signal handler for SIGCHLD, but it doesn't need to do anything.
Another option on Linux is to use signalfd (man page) to avoid both the need to create a pipe and set up a signal handler (I think). You should mask the SIGCHLD signal (using sigprocmask) if you use this. When poll (or equivalent) indicates that the signalfd is active, read the signal data from it (which clears the signal) and then call waitpid to reap the child.
On various BSD systems you can use kqueue (OpenBSD man page) instead of poll and watch for signals without needing to establish a signal handler.
On other POSIX systems you may be able to use pselect (documentation) in a similar way to ppoll as described above.
There is also the option of using a library such as libevent to abstract away the OS-specifics.
The Glibc manual has an example of using select. Consult the manual pages for poll, ppoll, pselect for more information about those functions. There is an online book on using Libevent.
Rough example for using select, borrowed from Glibc documentation (and modified):
/* Set up a pipe and set signal handler for SIGCHLD */
int pipefd[2]; /* must be a global variable */
pipe(pipefd); /* TODO check for error return */
fcntl(pipefd[1], F_SETFL, O_NONBLOCK); /* set write end non-blocking */
/* signal handler */
void sigchld_handler(int signum)
{
char a = 0; /* write anything, doesn't matter what */
write(pipefd[1], &a, 1);
}
/* set up signal handler */
signal(SIGCHLD, sigchld_handler);
Where you currently have accept, you need to check status of the server socket and the read end of the pipe:
fd_set set, outset;
struct timeval timeout;
/* Initialize the file descriptor set. */
FD_ZERO (&set);
FD_SET (fdlisten[server], &set);
FD_SET (pipefds[0], &set);
FD_ZERO(&outset);
for (;;) {
select (FD_SETSIZE, &set, NULL, &outset, NULL /* no timeout */));
/* TODO check for error return.
EINTR should just continue the loop. */
if (FD_ISSET(fdlisten[server], &outset)) {
/* now do accept() etc */
}
if (FD_ISSET(pipefds[0], &outset)) {
/* now do waitpid(), and read a byte from the pipe */
}
}
Using other mechanisms is generally simpler, so I leave those as an exercise :)

wait3 (waitpid alias) returns -1 with errno set to ECHILD when it should not

Context is this Redis issue. We have a wait3() call that waits for the AOF rewriting child to create the new AOF version on disk. When the child is done, the parent is notified via wait3() in order to substitute the old AOF with the new one.
However in the context of the above issue the user notified us about a bug. I modified a bit the implementation of Redis 3.0 in order to clearly log when wait3() returned -1 instead of crashing because of this unexpected condition. So this is what happens apparently:
wait3() is called when we have pending children to wait for.
the SIGCHLD should be set to SIG_DFL, there is no code setting this signal at all in Redis, so it's the default behavior.
When the first AOF rewrite happens, wait3() successfully works as expected.
Starting from the second AOF rewrite (the second child created), wait3() starts to return -1.
AFAIK it is not possible in the current code that we call wait3() while there are no pending children, since when the AOF child is created, we set server.aof_child_pid to the value of the pid, and we reset it only after a successful wait3() call.
So wait3() should have no reason to fail with -1 and ECHILD, but it does, so probably the zombie child is not created for some unexpected reason.
Hypothesis 1: It is possible that Linux during certain odd conditions will discard the zombie child, for example because of memory pressure? Does not look reasonable since the zombie has just metadata attached to it but who knows.
Note that we call wait3() with WNOHANG. And given that SIGCHLD is set to SIG_DFL by default, the only condition that should lead to failing and returning -1 and ECHLD should be no zombie available to report the information.
Hypothesis 2: Other thing that could happen but there is no explanation if it happens, is that after the first child dies, the SIGCHLD handler is set to SIG_IGN, causing wait3() to return -1 and ECHLD.
Hypothesis 3: Is there some way to remove the zombie children externally? Maybe this user has some kind of script that removes zombie processes in the background so that then the information is no longer available for wait3()? To my knowledge it should never be possible to remove the zombie if the parent does not wait for it (with waitpid or handling the signal) and if the SIGCHLD is not ignored, but maybe there is some Linux specific way.
Hypothesis 4: There is actually some bug in the Redis code so that we successfully wait3() the child the first time without correctly resetting the state, and later we call wait3() again and again but there are no longer zombies, so it returns -1. Analyzing the code it looks impossible, but maybe I'm wrong.
Another important thing: we never observed this in the past. Only happens in this specific Linux system apparently.
UPDATE: Yossi Gottlieb proposed that the SIGCHLD is received by another thread in the Redis process for some reason (does not happen normally, only on this system). We already mask SIGALRM in bio.c threads, perhaps we could try masking SIGCHLD from I/O threads as well.
Appendix: selected parts of Redis code
Where wait3() is called:
/* Check if a background saving or AOF rewrite in progress terminated. */
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
int statloc;
pid_t pid;
if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0;
if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
if (pid == -1) {
redisLog(LOG_WARNING,"wait3() returned an error: %s. "
"rdb_child_pid = %d, aof_child_pid = %d",
strerror(errno),
(int) server.rdb_child_pid,
(int) server.aof_child_pid);
} else if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode,bysignal);
} else if (pid == server.aof_child_pid) {
backgroundRewriteDoneHandler(exitcode,bysignal);
} else {
redisLog(REDIS_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
updateDictResizePolicy();
}
} else {
Selected parts of backgroundRewriteDoneHandler:
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
if (!bysignal && exitcode == 0) {
int newfd, oldfd;
char tmpfile[256];
long long now = ustime();
mstime_t latency;
redisLog(REDIS_NOTICE,
"Background AOF rewrite terminated with success");
... more code to handle the rewrite, never calls return ...
} else if (!bysignal && exitcode != 0) {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated with error");
} else {
server.aof_lastbgrewrite_status = REDIS_ERR;
redisLog(REDIS_WARNING,
"Background AOF rewrite terminated by signal %d", bysignal);
}
cleanup:
aofClosePipes();
aofRewriteBufferReset();
aofRemoveTempFile(server.aof_child_pid);
server.aof_child_pid = -1;
server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
server.aof_rewrite_time_start = -1;
/* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
server.aof_rewrite_scheduled = 1;
}
As you can see all the code paths must execute the cleanup code that reset server.aof_child_pid to -1.
Errors logged by Redis during the issue
21353:C 29 Nov 04:00:29.957 * AOF rewrite: 8 MB of memory used by copy-on-write
27848:M 29 Nov 04:00:30.133 ^# wait3() returned an error: No child processes. rdb_child_pid = -1, aof_child_pid = 21353
As you can see aof_child_pid is not -1.

TLDR: you are currently relying on unspecified behaviour of signal(2); use sigaction (carefully) instead.
Firstly, SIGCHLD is strange. From the manual page for sigaction;
POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be used to prevent the creation of zombies (see wait(2)). Nevertheless, the historical BSD and System V behaviors for ignoring SIGCHLD differ, so that the only completely portable method of ensuring that terminated children do not become zombies is to catch the SIGCHLD signal and perform a wait(2) or similar.
And here's the bit from wait(2)'s manual page:
POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD. (The original POSIX standard left the behavior of setting SIGCHLD to SIG_IGN unspecified. Note that even though the default disposition of SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN results in different treatment of zombie process children.) Linux 2.6 conforms to this specification. However, Linux 2.4 (and earlier) does not: if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored, that is, the call blocks until the next child terminates and then returns the process ID and status of that child.
Note the effect of that is that if the signal's handling behaves like SIG_IGN is set, then (under Linux 2.6+) you will see the behaviour you are seeing - i.e. wait() will return -1 and ECHLD because the child will have been automatically reaped.
Secondly, signal handling with pthreads (which I think you are using here) is notoriously hard. The way it's meant to work (as I'm sure you know) is that process directed signals get sent to an arbitrary thread within the process that has the signal unmasked. But whilst threads have their own signal mask, there is a process wide action handler.
Putting these two things together, I think you are running across a problem I've run across before. I have had problems getting SIGCHLD handling to work with signal() (which is fair enough as that was deprecated prior to pthreads), which were fixed by moving to sigaction and carefully setting per thread signal masks. My conclusion at the time was that the C library was emulating (with sigaction) what I was telling it to do with signal(), but was getting tripped up by pthreads.
Note that you are currently relying on unspecified behaviour. From the manual page of signal(2):
The effects of signal() in a multithreaded process are unspecified.
Here's what I recommend you do:
Move to sigaction() and pthread_sigmask(). Explicitly set the handling of all the signals you care about (even if you think that's the current default), even when setting them to SIG_IGN or SIG_DFL. I block signals whilst I do this (possibly overabundance of caution but I copied the example from somewhere).
Here's what I am doing (roughly):
sigset_t set;
struct sigaction sa;
/* block all signals */
sigfillset (&set);
pthread_sigmask (SIG_BLOCK, &set, NULL);
/* Set up the structure to specify the new action. */
memset (&sa, 0, sizeof (struct sigaction));
sa.sa_handler = handlesignal; /* signal handler for INT, TERM, HUP, USR1, USR2 */
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGINT, &sa, NULL);
sigaction (SIGTERM, &sa, NULL);
sigaction (SIGHUP, &sa, NULL);
sigaction (SIGUSR1, &sa, NULL);
sigaction (SIGUSR2, &sa, NULL);
sa.sa_handler = SIG_IGN;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGPIPE, &sa, NULL); /* I don't care about SIGPIPE */
sa.sa_handler = SIG_DFL;
sigemptyset (&sa.sa_mask);
sa.sa_flags = 0;
sigaction (SIGCHLD, &sa, NULL); /* I want SIGCHLD to be handled by SIG_DFL */
pthread_sigmask (SIG_UNBLOCK, &set, NULL);
Where possible set all your signal handlers and masks etc. prior to any pthread operations. Where possible do not change signal handlers and masks (you might need to do this prior to and subsequent to fork() calls).
If you need to a signal handler for SIGCHLD (rather than relying on SIG_DFL), if possible let it be received by any thread, and use the self-pipe method or similar to alert the main program.
If you must have threads that do/don't handle certain signals, try to restrict yourself to pthread_sigmask in the relevant thread rather than sig* calls.
Just in case you run headlong into the next issue I ran into, ensure that after you have fork()'d, you set up again the signal handling from scratch (in the child) rather than relying on whatever you might inherit from the the parent process. If there's one thing worse than signals mixed with pthread, it's signals mixed with pthread with fork().
Note I cannot explain exactly entirely why change (1) works, but it has fixed what looks like a very similar issue for me and was after all relying on something that was 'unspecified' previously. It's closest to your 'hypothesis 2' but I think it is really incomplete emulation of legacy signal functions (specifically emulating the previously racy behaviour of signal() which is what caused it to be replaced by sigaction() in the first place - but this is just a guess).
Incidentally, I suggest you use wait4() or (as you aren't using rusage) waitpid() rather than wait3(), so you can specify a specific PID to wait for. If you have something else that generates children (I've had a library do it), you may end up waiting for the wrong thing. That said, I don't think that's what's happening here.

c / interrupted system call / fork vs. thread

I discovered an issue with thread implementation, that is strange to me. Maybe some of you can explain it to me, would be great.
I am working on something like a proxy, a program (running on different machines) that receives packets over eth0 and sends it through ath0 (wireless) to another machine which is doing the exactly same thing. Actually I am not at all sure what is causing my problem, that's because I am new to everything, linux and c programming.
I start two threads,
one is listening (socket) on eth0 for incoming packets and sends it out through ath0 (also socket)
and the other thread is listening on ath0 and sends through eth0.
If I use threads, I get an error like that:
sh-2.05b# ./socketex
Failed to send network header packet.
: Interrupted system call
If I use fork(), the program works as expected.
Can someone explain that behaviour to me?
Just to show the sender implementation here comes its code snippet:
while(keep_going) {
memset(&buffer[0], '\0', sizeof(buffer));
recvlen = recvfrom(sockfd_in, buffer, BUFLEN, 0, (struct sockaddr *) &incoming, &ilen);
if(recvlen < 0) {
perror("something went wrong / incoming\n");
exit(-1);
}
strcpy(msg, buffer);
buflen = strlen(msg);
sentlen = ath_sendto(sfd, &btpinfo, &addrnwh, &nwh, buflen, msg, &selpv2, &depv);
if(sentlen == E_ERR) {
perror("Failed to send network header packet.\n");
exit(-1);
}
}
UPDATE: my main file, starting either threads or processes (fork)
int main(void) {
port_config pConfig;
memset(&pConfig, 0, sizeof(pConfig));
pConfig.inPort = 2002;
pConfig.outPort = 2003;
pid_t retval = fork();
if(retval == 0) {
// child process
pc2wsuThread((void *) &pConfig);
} else if (retval < 0) {
perror("fork not successful\n");
} else {
// parent process
wsu2pcThread((void *) &pConfig);
}
/*
wint8 rc1, rc2 = 0;
pthread_t pc2wsu;
pthread_t wsu2pc;
rc1 = pthread_create(&pc2wsu, NULL, pc2wsuThread, (void *) &pConfig);
rc2 = pthread_create(&wsu2pc, NULL, wsu2pcThread, (void *) &pConfig);
if(rc1) {
printf("error: pthread_create() is %d\n", rc1);
return(-1);
}
if(rc2) {
printf("error: pthread_create() is %d\n", rc2);
return(-1);
}
pthread_join(pc2wsu, NULL);
pthread_join(wsu2pc, NULL);
*/
return 0;
}
Does it help?
update 05/30/2011
-sh-2.05b# ./wsuproxy 192.168.1.100
mgmtsrvc
mgmtsrvc
Failed to send network header packet.
: Interrupted system call
13.254158,75.165482,DATAAAAAAmgmtsrvc
mgmtsrvc
mgmtsrvc
Still get the interrupted system call, as you can see above.
I blocked all signals as followed:
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
The two threads are working on the same interfaces, but on different ports. The problem seems to appear still in the same place (please find it in the first code snippet). I can't go further and have not enough knowledge of how to solve that problem. Maybe some of you can help me here again.
Thanks in advance.

EINTR does not itself indicate an error. It means that your process received a signal while it was in the sendto syscall, and that syscall hadn't sent any data yet (that's important).
You could retry the send in this case, but a good thing would be to figure out what signal caused the interruption. If this is reproducible, try using strace.
If you're the one sending the signal, well, you know what to do :-)
Note that on linux, you can receive EINTR on sendto (and some other functions) even if you haven't installed a handler yourself. This can happen if:
the process is stopped (via SIGSTOP for example) and restarted (with SIGCONT)
you have set a send timeout on the socket (via SO_SNDTIMEO)
See the signal(7) man page (at the very bottom) for more details.
So if you're "suspending" your service (or something else is), that EINTR is expected and you should restart the call.

Keep in mind if you are using threads with signals that a given signal, when delivered to the process, could be delivered to any thread whose signal mask is not blocking the signal. That means if you have blocked incoming signals in one thread, and not in another, the non-blocking thread will receive the signal, and if there is no signal handler setup for the signal, you will end-up with the default behavior of that signal for the entire process (i.e., all the threads, both signal-blocking threads and non-signal-blocking threads). For instance, if the default behavior of a signal was to terminate a process, one thread catching that signal and executing it's default behavior will terminate the entire process, for all the threads, even though some threads may have been masking the signal. Also if you have two threads that are not blocking a signal, it is not deterministic which thread will handle the signal. Therefore it's typically the case that mixing signals and threads is not a good idea, but there are exceptions to the rule.
One thing you can try, is since the signal mask for a spawned thread is inherited from the generating thread, is to create a daemon thread for handling signals, where at the start of your program, you block all incoming signals (or at least all non-important signals), and then spawn your threads. Now those spawned threads will ignore any incoming signals in the parent-thread's blocked signal mask. If you need to handle some specific signals, you can still make those signals part of the blocked signal mask for the main process, and then spawn your threads. But when you're spawning the threads, leave one thread (could even be the main process thread after it's spawned all the worker threads) as a "daemon" thread waiting for those specific incoming (and now blocked) signals using sigwait(). That thread will then dispatch whatever functions are necessary when a given signal is received by the process. This will avoid signals from interrupting system calls in your other worker-threads, yet still allow you to handle signals.
The reason your forked version may not be having issues is because if a signal arrives at one parent process, it is not propagated to any child processes. So I would try, if you can, to see what signal it is that is terminating your system call, and in your threaded version, block that signal, and if you need to handle it, create a daemon-thread that will handle that signal's arrival, with the rest of the threads blocking that signal.
Finally, if you don't have access to any external libraries or debuggers, etc. to see what signals are arriving, you can setup a simple procedure for seeing what signals might be arriving. You can try this code:
#include <signal.h>
#include <stdio.h>
int main()
{
//block all incoming signals
sigset_t signal_mask;
sigfillset(&signal_mask);
sigprocmask(SIG_BLOCK, &signal_mask, NULL);
//... spawn your threads here ...
//... now wait for signals to arrive and see what comes in ...
int arrived_signal;
while(1) //you can change this condition to whatever to exit the loop
{
sigwait(&signal_mask, &arrived_signal);
switch(arrived_signal)
{
case SIGABRT: fprintf(stderr, "SIGABRT signal arrived\n"); break;
case SIGALRM: fprintf(stderr, "SIGALRM signal arrived\n"); break;
//continue for the rest of the signals defined in signal.h ...
default: fprintf(stderr, "Unrecognized signal arrived\n");
}
}
//clean-up your threads and anything else needing clean-up
return 0;
}

sigprocmask() issue

The code below is from the book Advanced programming in unix environment, W. Richard Stevens
And about this code book says;
"If the signal is sent to the process while it is blocked, the signal delivery will be deferred until the signal is unblocked. To the application, this can look as if the signal occurs between the unblocking and the pause (depending on how the kernel implements signals). If this happens, or if the signal does occur between the unblocking and the pause, we have a problem. Any occurrence of the signal in this window of time is lost in the sense that we might not see the signal again, in which case the pause will block indefinitely. This is another problem with the earlier unreliable signals."
And it recommands to use sigsuspend() before resetting signal mask instead of pause() since it resets the signal mask and put the process to sleep in a single atomic operation. But I don't want my process wait until signal came after stepping out of critical region. So is this problem valid for my case too? If so what should i use not to lose signal while reseting signal mask with sigprocmask()?
sigset_t newmask, oldmask;
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
/* critical region of code */
/* reset signal mask, which unblocks SIGINT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
/* window is open */
pause(); /* wait for signal to occur */
/* continue processing */

sigsuspend is used in order to avoid a course between sigprocmask and pause. If you don't need to halt your thread until a signal is received, there is nothing sigsuspend can do for you. You don't give enough information to know if there are other sources of trouble in your context or not.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Handle signals with epoll_wait and signalfd - c

Related

master error when multiple signal are sent

C: fork() inform parent when child process disconnects

wait3 (waitpid alias) returns -1 with errno set to ECHILD when it should not

c / interrupted system call / fork vs. thread

sigprocmask() issue

Categories

Resources