Handling 'intterupted system call' error when using timer

Handling 'intterupted system call' error when using timer - c

I'm writing an application that uses timer to do some data acquisition and processing at a fix sample rate (200Hz).
The application acts like a server and run in background. It should be controllable from other processes or other machines from UDP.
To do so, I use the timer_create() API to generate SIGUSR1 periodically and call an handler that do the acquisition and the processing.
The code to configure the timer is as follow (minus error check for clarity):
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIGUSR1;
sev.sigev_value.sival_ptr = &timerid;
timer_create(CLOCK_REALTIME, &sev, &timerid);
timer_settime(...)
The code above is called when a 'start' command is received from UDP. To check for command I have an infinite loop in my main program that call recvfrom() syscall.
The problem is, when a 'start' command is received, and then, the timer is properly started and running (using the code above), I get an 'interrupted system calls' error (EINTR) due the SIGUSR1 signal sent by the timer interrupting the recvfrom() call. If I check for this particular error code and ignore it, I finally get a 'connection refused' error when calling recvfrom().
So here my questions:
How to solve this 'interrupted system calls' error as it seems to
ignore it and re-do the recvfrom() doesn't work?
Why do I get the 'connection refused' error after about twenty tries?
I have the feeling that using SIGEV_THREAD could be a solution, as I understand it, create a new thread (like phread_create) without generate a signal. Am I right?
Is the signal number important here? Is there any plus to use real time signal?
Is there any other way to do what I intent to do: having a background loop checking for command from UDP and real-time periodic task?
And here the bonus question:
Is it safe to do the data acquisition and the processing in the handler or should I use a semaphore mechanism to wake up a thread that do it?
Solution:
As suggest in an answer and in the comments, using SA_RESTART seems to fix the main issue.
Solution 2:
Using SIGEV_THREAD over SIGEV_SIGNAL works too. I've read somewhere that using SIGEV_THREAD could require more ressources than SIGEV_SIGNAL. However I have not seen significant difference regarding the timing of the task.

Timers tend to be implemented using SIGALARM.
Signal receipt, including SIGALARM, tends to cause long running system calls to return early with EINTR in errno.
SA_RESTART is one way around this, so system calls interrupted by receipt of a signal, will be automatically restarted. Another is to check for EINTR from your system calls' errno's and restart them when you receive EINTR.
With read() and write() of course, you can't just restart, you need to pick up where you left off. That's why these return the length of data transmitted.

Given that you're using Linux, I would opt for using timerfd_create instead.
That way you can just select(2), poll(2) or epoll(7) instead and handle timer events without the difficulty of signal handlers in your main loop.
As for EINTR (Interrupted System Call), those are properly handled by just restarting the specific system call that got interrupted.

Restarting the interrupted system call is the correct response to EINTR. You "Connection Refused" problem is an unrelated error - on a UDP socket, it indicates that a previous packet sent on that socket was rejected by the destination (notified through an ICMP message).

Question 5: Your use of a message and real-time periodic thread is perfectly fine. However, I would suggest you avoid using timers altogether, precisely because they use signals. I've run into this problem myself and eventually replaced the timer with a simple clock_nanosleep() that uses TIMER_ABSTIME with time updated to maintain the desired rate (i.e. add the period to the absolute time). The result was simpler code, no more problems with signals, and a more accurate timer than the signal-based timer. BTW, you should measure your timer's period in the handler to make sure it is accurate enough. My experience with timers was 8 years ago, so the problem with accuracy might be fixed. However, the other problems with signals are inherent to signals themselves and thus can't be "solved" -- only worked around.
Also, I see no problem with doing data acquisition from the handler, it should certainly reduce latency in retrieving the data.

Related

How to stop poll() from being interrupted by a specific signal

I have a C application using poll to wait for some data.
Currently I am implementing the rest of my application into this one and I use time based interrupts (SIGRTMIN). As expected poll() returns if one of my other timers call back.
How can I stop poll from doing that? I am reading a lot about ppoll(), but not sure how to use that... Can I use this to stop this function from returning when a timer event is fired?
(I do not have any problems with the poll being delayed a few ms)

If a thread / process blocking in poll() receives an unblocked signal then poll() will be interrupted. If you don't want that to happen then you can block the desired signal before calling poll(), and then unblock it after poll() returns (see sigprocmask()). Note, however, that that won't cause poll() to be delayed -- quite the opposite. If anything, it will cause receipt of the signal to be delayed. If poll() blocks long enough then it could cause multiple RT signals to queue up, so that after you unblock that signal you receive it multiple times in quick succession.
You should consider instead checking poll()'s return value (which you should always do anyway) and retrying if it is EINTR.

Create new signal or multiplex SIGALRM?

I am trying to write a benchmark that receives a signal from the kernel telling it to adjust its parameters. I'm trying to study whether a proactive or reactive approach works best.
In the proactive approach, I use setitimer to set an alarm periodically and force the benchmark to look at its performance thus far and re-tune itself.
In the reactive approach, the kernel periodically monitors the process and signals it if it is performing poorly.
Since I've been using the setitimer functionality, and since setitimer causes SIGALRM, I have asked the kernel to throw a SIGALRM in the reactive approach. This has been working fine. However, now I need to use SIGALRM to run the benchmark for a specific duration of time.
Is there a way to multiplex SIGALRM to serve both purposes - to do a timed run and terminate and to re-tune. Is there a function/syscall similar to setitimer that allows the user to set an alarm but with a custom signal?

Yes. You want to look at the timer_create / timer_settime etc., family of calls.
The 2nd parameter of timer_create is a struct sigevent. The field within that, sigev_signo can be set to send a specific signal number on timer expiration.

Running select() socket and timers in the same linux thread

I am writing code on ucLinux for socket communication. I use select() for reading the data on sockets. I also have a 20 msec timer (created using setitimer) running in the same thread for performing a parallel operation. My select function gets blocked each time saying "Interrupted by system call", since it receives the SIGALRM signal issue by the timer on overflow, every 20 msec. I tried restarting the system when EINTR is issued, and run select() again. but this wont help, since i will always receive the SIGALRM by timer every 20 msec. I dont want to ignore this signal since it is used for performing other tasks in the system, but i want to use select without being affected by this signal. Is there any way to handle this? I cannot use functions like timer_create() as these are not supported on the platform I am using. So, I am stuck up with using setitimerfor timer creation. Is there any way I can run both together independently in my code?

What you're doing is pretty weird. Let's face it: timers are an ancient and mostly-obsolete mechanism for doing work. Pretty much everyone these days avoids signals like the plague. There's essentially nothing useful you can do in a signal callback (you certainly can't call anything complicated like malloc for example), so you must have some way to get the timer notification back from the SIGALRM handler to the main thread already -- you're not actually doing the work in the signal handler are you?
So you have two tactics: use the standard self-pipe trick to turn the signal into an event on an fd, the "normal" way to handle things like SIGTERM, SIGINT and so on. You call socketpair or pipe to make a pipe, then write a byte into the pipe from the signal handler. You read the byte back from you select loop. You commonly write the value of the signal as the data, but you could write anything really.
The other tactic (much more sane) is to avoid the mess with signals and setitimer completely. setitimer is seriously legacy and causes problems for all sorts of things (eg. it can cause functions like getaddrinfo to hang, a bug that still hasn't been fixed in glibc (http://www.cygwin.org/frysk/bugzilla/show_bug.cgi?id=15819). Signals are bad for your health. So the "normal" tactic is to use the timeout argument to select. You have a linked list of timers, objects you use to manager periodic events in your code. When you call select, you use as the timeout the shortest of your remaining timers. When the select call returns, you check if any timers are expired and call the timer handler as well as the handlers for your fd events. That's a standard application event loop. This way your loop code so you can listen for timer-driven events as well as fd-driven events. Pretty much every application on your system uses some variant on this mechanism.

Is an option for you doing something like this?
While(1) {
int rc = select(nfds, &readfds, &writefds, &exceptfds, &timeout);
if ((rc < 0) && (errno == EINTR) )
continue;
else {
// some instructions
}
}
If this is not an option for you you can probably use pselect which adds a parameter to the end (sigmask) which specifies a set of signals that should be blocked during the pselect(), see here

Linux select() vs ppoll() vs pselect()

In my application, there is a io-thread, that is dedicated for
Wrapping data received from the application in a custom protocol
Sending the data+custom protocol packet over tcp/ip
Receiving data+custom protocol packet over tcp/ip
Unwrapping the custom protocol and handing the data to the application.
Application processes the data over a different thread. Additionally, the requirements dictate that the unacknowledged window size should be 1, i.e. there should be only one pending unacknowledged message at anytime. This implies that if io-thread has dispatched a message over the socket, it will not send any more messages, till it hears an ack from the receiver.
Application's processing thread communicates to io-thread via pipe. Application needs to shut gracefully if someone from linux CLI types ctrl+C.
Thus, given these requirements, i have following options
Use PPoll() on socket and pipe descriptors
Use Select()
Use PSelect()
I have following questions
The decision between select() and poll(). My application only deals with less than 50 file descriptors. Is it okay to assume there would be no difference whether i choose select or poll ?
Decision between select() and pselect(). I read the linux documentation and it states about race condition between signals and select(). I dont have experience with signals, so can someone explain more clearly about the race condition and select() ? Does it have something to do with someone pressing ctrl+C on CLI and application not stopping?
Decision between pselect and ppoll() ? Any thoughts on one vs the other

I'd suggest by starting the comparison with select() vs poll(). Linux also provides both pselect() and ppoll(); and the extra const sigset_t * argument to pselect() and ppoll() (vs select() and poll()) has the same effect on each "p-variant", as it were. If you are not using signals, you have no race to protect against, so the base question is really about efficiency and ease of programming.
Meanwhile there's already a stackoverflow.com answer here: what are the differences between poll and select.
As for the race: once you start using signals (for whatever reason), you will learn that in general, a signal handler should just set a variable of type volatile sig_atomic_t to indicate that the signal has been detected. The fundamental reason for this is that many library calls are not re-entrant, and a signal can be delivered while you're "in the middle of" such a routine. For instance, simply printing a message to a stream-style data structure such as stdout (C) or cout (C++) can lead to re-entrancy issues.
Suppose you have code that uses a volatile sig_atomic_t flag variable, perhaps to catch SIGINT, something like this (see also http://pubs.opengroup.org/onlinepubs/007904975/functions/sigaction.html):
volatile sig_atomic_t got_interrupted = 0;
void caught_signal(int unused) {
got_interrupted = 1;
}
...
struct sigaction sa;
sa.sa_handler = caught_signal;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
if (sigaction(SIGINT, &sa, NULL) == -1) ... handle error ...
...
Now, in the main body of your code, you might want to "run until interrupted":
while (!got_interrupted) {
... do some work ...
}
This is fine up until you start needing to make calls that wait for some input/output, such as select or poll. The "wait" action needs to wait for that I/O—but it also needs to wait for a SIGINT interrupt. If you just write:
while (!got_interrupted) {
... do some work ...
result = select(...); /* or result = poll(...) */
}
then it's possible that the interrupt will happen just before you call select() or poll(), rather than afterward. In this case, you did get interrupted—and the variable got_interrupted gets set—but after that, you start waiting. You should have checked the got_interrupted variable before you started waiting, not after.
You can try writing:
while (!got_interrupted) {
... do some work ...
if (!got_interrupted)
result = select(...); /* or result = poll(...) */
}
This shrinks the "race window", because now you'll detect the interrupt if it happens while you're in the "do some work" code; but there is still a race, because the interrupt can happen right after you test the variable, but right before the select-or-poll.
The solution is to make the "test, then wait" sequence "atomic", using the signal-blocking properties of sigprocmask (or, in POSIX threaded code, pthread_sigmask):
sigset_t mask, omask;
...
while (!got_interrupted) {
... do some work ...
/* begin critical section, test got_interrupted atomically */
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
if (sigprocmask(SIG_BLOCK, &mask, &omask))
... handle error ...
if (got_interrupted) {
sigprocmask(SIG_SETMASK, &omask, NULL); /* restore old signal mask */
break;
}
result = pselect(..., &omask); /* or ppoll() etc */
sigprocmask(SIG_SETMASK, &omask, NULL);
/* end critical section */
}
(the above code is actually not that great, it's structured for illustration rather than efficiency -- it's more efficient to do the signal mask manipulation slightly differently, and place the "got interrupted" tests differently).
Until you actually start needing to catch SIGINT, though, you need only compare select() and poll() (and if you start needing large numbers of descriptors, some of the event-based stuff like epoll() is more efficient than either one).

Between (p)select and (p)poll is a rather subtle difference:
For select, you have to initialize and populate the ugly fd_set bitmaps everytime before you call select because select modifies them in-place in a "destructive" fashion. (poll distinguishes between the .events and .revents members in struct pollfd).
After selecting, the entire bitmap is often scanned (by people/code) for events even if most of the fds are not even watched.
Third, the bitmap can only deal with fds whose number is less than a certain limit (contemporary implementations: somewhere between 1024..4096), which rules it out in programs where high fds can be easibly attained (notwithstanding that such programs are likely to already use epoll instead).

The accepted answer is not correct vis a vis difference between select and pselect. It does describe well how a race condition between sig-handler and select can arise, but it is incorrect in how it uses pselect to solve the problem. It misses the main point about pselect which is that it waits for EITHER the file-descriptor or the signal to become ready. pselect returns when either of these are ready.Select ONLY waits on the file-descriptor. Select ignores signals. See this blog post for a good working example:
https://www.linuxprogrammingblog.com/code-examples/using-pselect-to-avoid-a-signal-race

To make the picture presented by the accepted answer complete following basic fact should be mentioned: both select() and pselect() may return EINTR as stated in their man pages:
EINTR A signal was caught; see signal(7).
This "caught" means that the signal should be recognized as "occurred during the system call execution":
1. If non-masked signal occurs during select/pselect execution then select/pselect will exit.
2. If non-masked signal occurs before select/pselect has been called this will not have any effect and select/pselect will continue waiting, potentially forever.
So if a signal occurs during select/pselect execution we are ok - the execution of select/pselect will be interrupted and then we can test the reason for the exit and discover that is was EINTR and then we can exit the loop.
The real threat that we face is a possibility of signal occurrence outside of select/pselect execution, then we may hang in the system call forever. Any attempt to discover this "outsider" signal by naive means:
if (was_a_signal) {
...
}
will fail since no matter how close this test will be to the call of select/pselect there is always a possibility that the signal will occur just after the test and before the call to select/pselect.
Then, if the only place to catch the signal is during select/pselect execution we should invent some kind of "wine funnel" so all "wine splashes" (signals), even outside of "bottle neck" (select/pselect execution period) will eventually come to the "bottle neck".
But how can you deceive system call and make it "think" that the signal has occurred during this system call execution when in reality it has occurred before?
Easy. Here is our "wine funnel": you just block the signal of interest and by that cause it (if it has occurred at all) waiting outside of the process "for the door to be opened" and you "open the door" (unmask the signal) only when you're prepared "to welcome the guest" (select/pselect is running). Then the "arrived" signal will be recognized as "just occurred" and will interrupt the execution of the system call.
Of course, "opening the door" is the most critical part of the plan - it cannot be done by the usual means (first unmask, then call to select/pselect), the only possibility is to do the both actions (unmask and system call) at once (atomically) - this is what pselect() is capable of but select() is not.

Trying to exit from a blocking UDP socket read

This is a question similar to Proper way to close a blocking UDP socket. I have a thread in C which is reading from a UDP socket. The read is blocking. I would like to know if it is possible to be able to exit the thread, without relying on the recv() returning? For example can I close the socket from another thread and safely expect the socket read thread to exit? Didn't see any high voted answer on that thread, thats why I am asking it again.

This really depends on what system you're running under. For example, if you're running under a POSIX-compliant system and your thread is cancelable, the recv() call will be interrupted when you cancel the thread since it's a cancel point.
If you're using an older socket implementation, you could set a signal handler for your thread for something like SIGUSR1 and hope nobody else wanted it and signal, since recv() will interrupt on a signal. Your best option is not to block, if at all possible.

I don't think closing a socket involved in a blocking operation is a safe guaranteed way of terminating the operation. For instance, kernel.org warns darkly:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Instead you could use a signal and make recv fail with EINTR
(make sure SA_RESTART is not enabled). You can send a signal to a
specific thread with pthread_kill
You could enable SO_RCVTIMEO on the socket before starting the recv
call
Personally I usually try to stay clear of all the signal nastiness but it's a viable option.

You've got a couple of options for that. A signal will interrupt the read operation, so all you need to do is make sure a signal goes off. The recv operation should fail with error number EINTR.
The simplest option is to set up a timer to interrupt your own process after some timeout e.g. 30 seconds:
itimerval timer
timeval time;
time.tv_sec = 30;
time.tv_usec = 0;
timer.it_value = time;
if( setitimer( ITIMER_REAL, &timer, NULL ) != 0 )
printf( "failed to start timer\n" );
You'll get a SIGALRM after the specified time, which will interrupt your blocking operation, and give you the chance to repeat the operation or quit.

You cannot deallocate a shared resource while another thread is or might be using it. In practice, you will find that you cannot even write code to do what you suggest.
Think about it. When you go to call close, how can you possibly know that the other thread is actually blocked in recv? What if it's about to call recv, but then another thread calls socket and gets the descriptor you just closed? Now, not only will that thread not detect any error, but it will be calling recv on the wrong socket!
There is probably a good way to solve your outer problem, the reason you need to exit from a blocking UDP socket read. There are also several ugly hacks available. The basic approach is to make the socket non-blocking and instead of making a blocking UDP socket read, fake a blocking read with select or poll. You can then abort this loop several ways:
One way is to have select time out and check an 'abort' flag when select returns.
Another way is to also select on the read end of a pipe. Send a single byte to the pipe to abort the select.

If posix complient system, you can try to monitor your thread:
pthread_create with a function that makes your recv and pthread_cond_signal just after, then returns.
The calling thread makes a pthread_cond_timedwait with the desired timeout and terminates the called thread if timed_out.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight