Posix evtsuspend equivalent - c

I am migrating a LynxOS program to an ubuntu distribution and gcc 4.1.3
I am almost done but I have a problem, I am receiving SIGALRM signal which forces my program to exit. I dont know why I am receiving this signals if I am not calling to alarm(x).
I roundabouted this with a sigaction, but my program is not working properly mq_receive is failing every time this SIGALRM is received.
I wonder if it could be because of this code translation:
#include <events.h>
#include <timers.h>
evtset_t EvtMask;
struct timespec Time;
Time.tv_sec = 2;
Time.tv_nsec = 0;
evtsuspend (&EvtMask, &Time);
would now be
sleep(2);
This is the info about evtsuspend given by LynxOS:
evtsuspend
(cant insert the image because of my lack of reputation)
Do you think they work the same? (without specifying an event mask) sleep() also waits for a SIGALRM to continue.
Thanks and regards

1) Try running strace on your program to see if you can find out more info.
It'd be nice to have more details about your program... but maybe this will help.
Maybe mq_receive() is timing out. I think that SIGALRM is used to notify applications for timed-out system calls.
Or more likely, you're using something else that's causing SIGALRM in your code, ie: setitimer().
As for your question about using sleep(2) on linux. If you want the program to block during the sleep(2) call, then yes you should be OK with using it. If you don't want it to block, then start an interval timer setitimer() and use that. Note: setitimer() uses SIGALRM when the timer fires... see the manpage for details.

I see this is an old topic, but here is an answer to the mq_receive() part of your question:
mq_receive() will unblock when a signal is received. When this happens, the return code will be set to -1 and errno will be set to EINTR. You can wrap code around your call to mq_receive() to check for this and recall it if necessary.

Related

Why does `nanosleep` need the argument of `req` while kernel has the chance to restart the system call again internally(`-ERESTARTSYS`)?

As per the documentation(https://lwn.net/Articles/17744/), which says "nanosleep(), which is currently the only user of this mechanism, need only save the wakeup time in the restart block, along with pointers to the user arguments..".
If so, why nanosleep needs a argument req whose type is struct timespec *.
As per the linux program manual, "int nanosleep(const struct timespec *req, struct timespec *rem); If the call is interrupted by a signal handler, nanosleep() returns -1, sets errno to EINTR, and writes the remaining time into the structure pointed to by rem unless rem is NULL. "
I think that if the kernel could restart the system call('do_nanosleep') internally, there is no need to return the duration that how long you have left to sleep again to the user space. That's what I could not understand.
ERESTARTSYS should never be seen from user code, you are correct. It is a flag for the kernel to restart a call itself, or return EINTR to user code. Please see this discussion on the Linux Kernel Mailing List:
So which way is it supposed to be (so someone can patch things up |>
to make it consistent): |> |> 1. User space should never see
ERESTARTSYS from any system call
Yes. The kernel either transforms it to EINTR, or restarts the syscall
when the signal handler returns.
Or this article on LWN.net
What happens, though, if a signal is queued for the process while it
is waiting? In that case, the system call needs to abort its work and
allow the actual delivery of the signal. For this reason, kernel code
which sleeps tends to follow the sleep with a test like:
if (signal_pending(current)) return -ERESTARTSYS;
After the signal has been handled, the system call will be restarted
(from the beginning), and the user-space application need not deal
with "interrupted system call" errors. For cases where restarting is
not appropriate, a -EINTR return status will cause a (post-signal)
return to user space without restarting the system call.
I don't think any of this has to do with nanosleep(2) params other than that under the cover it uses this mechanism. The nanosleep docs tell you what the params do, req is how long you want to sleep, and rem is how long you have left if you get woken up early.
The title of the question doesn't entirely match the actual question. #dsolimano did answer the title.
However, it seems that you're asking why code that calls nanosleep() needs to handle a case like EINTR if ERESTARTSYS presumably solves the problem in the kernel.
Assuming that this is the question, the answer is, that this is not the problem.
Here are a couple of use cases for EINTR:
You want to wait for a certain amount of time, but be able to handle signals synchronously (i.e. not in a signal handler). For example, you are waiting for a DB to initialize, but if the user presses Ctrl+C you want to show the current DB status and continue waiting.
You want to wait for a signal, but with a timeout. So you sleep for the timeout, but if nanosleep() returned EINTR you know you got a signal.
Regarding your "auxiliary questions", I'll tl;dr #dsolimano's answers:
What are the differences between ERESTARTSYS and EINTR?
ERESTARTSYS is a kernel implementation detail, EINTR is part of the kernel's API.
Is ERESTARTSYS only used in kernel or driver?
Yes.
why does nanosleep() need an argument of type is struct timespec *req?
req is the number of nanoseconds to sleep. You probably meant rem. The first use case I outlined above is an example why.

Running select() socket and timers in the same linux thread

I am writing code on ucLinux for socket communication. I use select() for reading the data on sockets. I also have a 20 msec timer (created using setitimer) running in the same thread for performing a parallel operation. My select function gets blocked each time saying "Interrupted by system call", since it receives the SIGALRM signal issue by the timer on overflow, every 20 msec. I tried restarting the system when EINTR is issued, and run select() again. but this wont help, since i will always receive the SIGALRM by timer every 20 msec. I dont want to ignore this signal since it is used for performing other tasks in the system, but i want to use select without being affected by this signal. Is there any way to handle this? I cannot use functions like timer_create() as these are not supported on the platform I am using. So, I am stuck up with using setitimerfor timer creation. Is there any way I can run both together independently in my code?
What you're doing is pretty weird. Let's face it: timers are an ancient and mostly-obsolete mechanism for doing work. Pretty much everyone these days avoids signals like the plague. There's essentially nothing useful you can do in a signal callback (you certainly can't call anything complicated like malloc for example), so you must have some way to get the timer notification back from the SIGALRM handler to the main thread already -- you're not actually doing the work in the signal handler are you?
So you have two tactics: use the standard self-pipe trick to turn the signal into an event on an fd, the "normal" way to handle things like SIGTERM, SIGINT and so on. You call socketpair or pipe to make a pipe, then write a byte into the pipe from the signal handler. You read the byte back from you select loop. You commonly write the value of the signal as the data, but you could write anything really.
The other tactic (much more sane) is to avoid the mess with signals and setitimer completely. setitimer is seriously legacy and causes problems for all sorts of things (eg. it can cause functions like getaddrinfo to hang, a bug that still hasn't been fixed in glibc (http://www.cygwin.org/frysk/bugzilla/show_bug.cgi?id=15819). Signals are bad for your health. So the "normal" tactic is to use the timeout argument to select. You have a linked list of timers, objects you use to manager periodic events in your code. When you call select, you use as the timeout the shortest of your remaining timers. When the select call returns, you check if any timers are expired and call the timer handler as well as the handlers for your fd events. That's a standard application event loop. This way your loop code so you can listen for timer-driven events as well as fd-driven events. Pretty much every application on your system uses some variant on this mechanism.
Is an option for you doing something like this?
While(1) {
int rc = select(nfds, &readfds, &writefds, &exceptfds, &timeout);
if ((rc < 0) && (errno == EINTR) )
continue;
else {
// some instructions
}
}
If this is not an option for you you can probably use pselect which adds a parameter to the end (sigmask) which specifies a set of signals that should be blocked during the pselect(), see here

Handling 'intterupted system call' error when using timer

I'm writing an application that uses timer to do some data acquisition and processing at a fix sample rate (200Hz).
The application acts like a server and run in background. It should be controllable from other processes or other machines from UDP.
To do so, I use the timer_create() API to generate SIGUSR1 periodically and call an handler that do the acquisition and the processing.
The code to configure the timer is as follow (minus error check for clarity):
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIGUSR1;
sev.sigev_value.sival_ptr = &timerid;
timer_create(CLOCK_REALTIME, &sev, &timerid);
timer_settime(...)
The code above is called when a 'start' command is received from UDP. To check for command I have an infinite loop in my main program that call recvfrom() syscall.
The problem is, when a 'start' command is received, and then, the timer is properly started and running (using the code above), I get an 'interrupted system calls' error (EINTR) due the SIGUSR1 signal sent by the timer interrupting the recvfrom() call. If I check for this particular error code and ignore it, I finally get a 'connection refused' error when calling recvfrom().
So here my questions:
How to solve this 'interrupted system calls' error as it seems to
ignore it and re-do the recvfrom() doesn't work?
Why do I get the 'connection refused' error after about twenty tries?
I have the feeling that using SIGEV_THREAD could be a solution, as I understand it, create a new thread (like phread_create) without generate a signal. Am I right?
Is the signal number important here? Is there any plus to use real time signal?
Is there any other way to do what I intent to do: having a background loop checking for command from UDP and real-time periodic task?
And here the bonus question:
Is it safe to do the data acquisition and the processing in the handler or should I use a semaphore mechanism to wake up a thread that do it?
Solution:
As suggest in an answer and in the comments, using SA_RESTART seems to fix the main issue.
Solution 2:
Using SIGEV_THREAD over SIGEV_SIGNAL works too. I've read somewhere that using SIGEV_THREAD could require more ressources than SIGEV_SIGNAL. However I have not seen significant difference regarding the timing of the task.
Timers tend to be implemented using SIGALARM.
Signal receipt, including SIGALARM, tends to cause long running system calls to return early with EINTR in errno.
SA_RESTART is one way around this, so system calls interrupted by receipt of a signal, will be automatically restarted. Another is to check for EINTR from your system calls' errno's and restart them when you receive EINTR.
With read() and write() of course, you can't just restart, you need to pick up where you left off. That's why these return the length of data transmitted.
Given that you're using Linux, I would opt for using timerfd_create instead.
That way you can just select(2), poll(2) or epoll(7) instead and handle timer events without the difficulty of signal handlers in your main loop.
As for EINTR (Interrupted System Call), those are properly handled by just restarting the specific system call that got interrupted.
Restarting the interrupted system call is the correct response to EINTR. You "Connection Refused" problem is an unrelated error - on a UDP socket, it indicates that a previous packet sent on that socket was rejected by the destination (notified through an ICMP message).
Question 5: Your use of a message and real-time periodic thread is perfectly fine. However, I would suggest you avoid using timers altogether, precisely because they use signals. I've run into this problem myself and eventually replaced the timer with a simple clock_nanosleep() that uses TIMER_ABSTIME with time updated to maintain the desired rate (i.e. add the period to the absolute time). The result was simpler code, no more problems with signals, and a more accurate timer than the signal-based timer. BTW, you should measure your timer's period in the handler to make sure it is accurate enough. My experience with timers was 8 years ago, so the problem with accuracy might be fixed. However, the other problems with signals are inherent to signals themselves and thus can't be "solved" -- only worked around.
Also, I see no problem with doing data acquisition from the handler, it should certainly reduce latency in retrieving the data.

How to avoid the interruption of sleep calls due to a signal in Linux?

I'm using a real time signal in Linux to be notified of the arrival of new data in a serial port. Unfortunately this causes sleep calls to be interrupted when there is signal.
Does anybody know of a way to avoid this behavior?
I tried using a regular signal (SIGUSR1) but I keep getting the same behavior.
From the nanosleep manpage:
nanosleep delays the execution of the program for at least the time specified in *req. The function can return earlier if a signal has been delivered to the process. In this case, it returns -1, sets errno to EINTR, and writes the remaining time into the structure pointed to by rem unless rem is NULL. The value of *rem can then be used to call nanosleep again and complete the specified pause.
You can mask almost all signals (except SIGKILL) using sigprocmask() or signal() calls. The first one will return you the previous mask, which you can recover after sleep(). Some examples are here. If that does not help, please, be more verbose of what signal interrupts your sleep. I think, you can additionally check this condition ("sleep interrupted by signal?") and fall into sleep again.
Newer Linux kernels support signalfd(2). That, together with sigprocmask(2), is a very nice way to combine handling of signal and IO events in a single epoll_wait(2) call.
If you don't want to be interrupted, why are you using the real time signal?
Somewhere, either in Rockind's "Advanced Unix Programming" or Steven's book, there was an example of how to fake this out. You make note of the current time_t before starting your sleep. After the sleep ends, you check to make sure the required amount of time has elapsed, and if it hasn't, you start a new sleep. Put the sleep in a loop that calculates the time to go and sleeps that amount, and exits when the required time has passed.
Well, a realtime signal is supposed to interrupt sleep. You could use a non-realtime signal instead. Another approach is to check if the expected time to sleep has elapsed, and if not, sleep for the remaining interval.

Non-blocking pthread_join

I'm coding the shutdown of a multithreaded server.If everything goes as it should all the threads exit by their own, but there's a small chance that a thread gets stuck.In this case it would be convenient to have a non-blocking join so I could do.
Is there a way of doing a non-blocking pthread_join?
Some sort of timed join would be good too.
something like this:
foreach thread do
nb_pthread_join();
if still running
pthread_cancel();
I can think more cases where a a non-bloking join would be useful.
As it seems there is no such a function so I have already coded a workaround, but it's not as simple as I would like.
If you are running your application on Linux, you may be interested to know that:
int pthread_tryjoin_np(pthread_t thread, void **retval);
int pthread_timedjoin_np(pthread_t thread, void **retval,
const struct timespec *abstime);
Be careful, as the suffix suggests it, "np" means "non-portable". They are not POSIX standard, gnu extensions, useful though.
link to man page
The 'pthread_join' mechanism is a convenience to be used if it happens to do exactly what you want. It doesn't do anything you couldn't do yourself, and where it's not exactly what you want, code exactly what you want.
There is no real reason you should actually care whether a thread has terminated or not. What you care about is whether the work the thread was doing is completed. To tell that, have the thread do something to indicate that it is working. How you do that depends on what is ideal for your specific problem, which depends heavily on what the threads are doing.
Start by changing your thinking. It's not a thread that gets stuck, it's what the thread was doing that gets stuck.
If you're developing for QNX, you can use pthread_timedjoin() function.
Otherwise, you can create a separate thread that will perform pthread_join() and alert the parent thread, by signalling a semaphore for example, that the child thread completes. This separate thread can return what is gets from pthread_join() to let the parent thread determine not only when the child completes but also what value it returns.
As others have pointed out there is not a non-blocking pthread_join available in the standard pthread libraries.
However, given your stated problem (trying to guarantee that all of your threads have exited on program shutdown) such a function is not needed. You can simply do this:
int killed_threads = 0;
for(i = 0; i < num_threads; i++) {
int return = pthread_cancel(threads[i]);
if(return != ESRCH)
killed_threads++;
}
if(killed_threads)
printf("%d threads did not shutdown properly\n", killed_threads)
else
printf("All threads exited successfully");
There is nothing wrong with calling pthread_cancel on all of your threads (terminated or not) so calling that for all of your threads will not block and will guarantee thread exit (clean or not).
That should qualify as a 'simple' workaround.
The answer really depends on why you want to do this. If you just want to clean up dead threads, for example, it's probably easiest just to have a "dead thread cleaner" thread that loops and joins.
I'm not sure what exactly you mean, but I'm assuming that what you really need is a wait and notify mechanism.
In short, here's how it works: You wait for a condition to satisfy with a timeout. Your wait will be over if:
The timeout occurs, or
If the condition is satisfied.
You can have this in a loop and add some more intelligence to your logic. The best resource I've found for this related to Pthreads is this tutorial:
POSIX Threads Programming (https://computing.llnl.gov/tutorials/pthreads/).
I'm also very surprised to see that there's no API for timed join in Pthreads.
There is no timed pthread_join, but if you are waiting for other thread blocked on conditions, you can use timed pthread_cond_timed_wait instead of pthread_cond_wait
You could push a byte into a pipe opened as non-blocking to signal to the other thread when its done, then use a non-blocking read to check the status of the pipe.

Resources