Slow system call and signals - c

I have read this subject:
Relationship slow system call with signal
and everything isn't clear for me. Especially I don't understand this part of answer because I don't see a problem with included source code.
Please explain me.
Thanks in advance.
Anyway, back to the question. If you're wondering why the read doesn't
fail with EINTR the answer is SA_RESTART. On most Unix systems a few
system calls are automatically restarted in the event of a signal.

The OP was expecting the read call to return an error code because it was interrupted by a signal. In the case of the read system call, the OS automatically restarts this system call in the event of a signal, so no error was returned.

Related

Why does `nanosleep` need the argument of `req` while kernel has the chance to restart the system call again internally(`-ERESTARTSYS`)?

As per the documentation(https://lwn.net/Articles/17744/), which says "nanosleep(), which is currently the only user of this mechanism, need only save the wakeup time in the restart block, along with pointers to the user arguments..".
If so, why nanosleep needs a argument req whose type is struct timespec *.
As per the linux program manual, "int nanosleep(const struct timespec *req, struct timespec *rem); If the call is interrupted by a signal handler, nanosleep() returns -1, sets errno to EINTR, and writes the remaining time into the structure pointed to by rem unless rem is NULL. "
I think that if the kernel could restart the system call('do_nanosleep') internally, there is no need to return the duration that how long you have left to sleep again to the user space. That's what I could not understand.
ERESTARTSYS should never be seen from user code, you are correct. It is a flag for the kernel to restart a call itself, or return EINTR to user code. Please see this discussion on the Linux Kernel Mailing List:
So which way is it supposed to be (so someone can patch things up |>
to make it consistent): |> |> 1. User space should never see
ERESTARTSYS from any system call
Yes. The kernel either transforms it to EINTR, or restarts the syscall
when the signal handler returns.
Or this article on LWN.net
What happens, though, if a signal is queued for the process while it
is waiting? In that case, the system call needs to abort its work and
allow the actual delivery of the signal. For this reason, kernel code
which sleeps tends to follow the sleep with a test like:
if (signal_pending(current)) return -ERESTARTSYS;
After the signal has been handled, the system call will be restarted
(from the beginning), and the user-space application need not deal
with "interrupted system call" errors. For cases where restarting is
not appropriate, a -EINTR return status will cause a (post-signal)
return to user space without restarting the system call.
I don't think any of this has to do with nanosleep(2) params other than that under the cover it uses this mechanism. The nanosleep docs tell you what the params do, req is how long you want to sleep, and rem is how long you have left if you get woken up early.
The title of the question doesn't entirely match the actual question. #dsolimano did answer the title.
However, it seems that you're asking why code that calls nanosleep() needs to handle a case like EINTR if ERESTARTSYS presumably solves the problem in the kernel.
Assuming that this is the question, the answer is, that this is not the problem.
Here are a couple of use cases for EINTR:
You want to wait for a certain amount of time, but be able to handle signals synchronously (i.e. not in a signal handler). For example, you are waiting for a DB to initialize, but if the user presses Ctrl+C you want to show the current DB status and continue waiting.
You want to wait for a signal, but with a timeout. So you sleep for the timeout, but if nanosleep() returned EINTR you know you got a signal.
Regarding your "auxiliary questions", I'll tl;dr #dsolimano's answers:
What are the differences between ERESTARTSYS and EINTR?
ERESTARTSYS is a kernel implementation detail, EINTR is part of the kernel's API.
Is ERESTARTSYS only used in kernel or driver?
Yes.
why does nanosleep() need an argument of type is struct timespec *req?
req is the number of nanoseconds to sleep. You probably meant rem. The first use case I outlined above is an example why.

How to return from a select with SIGINT

I need your help to solve this problem.
I have to create a multi-threaded client-server program on unix, based on AF_UNIX
sockets, that must handle up to some thousands simultaneous connections and also must do different things based on the type of signal received, like shutdown when server receives a SIGINT.
I thought of doing this disabling, initially, SIGINT and the other signals from the main's thread sigmask, then starting up a dispatching thread, that keeps (I know that's really inefficient this) waiting on select() for I/0 requests, accepts the new connection and then reads exactly sizeof(request) bytes, where request is a well-known structure, then creating also a thread that handles the signals received, the only one that re-enables the signals, using sigwait(), and finally starting up the other server thread to execute the real work.
I have this questions:
I would like to let select() return even if the dispatcher thread is stuck in it. I've red of a self-pipe trick about this, but I think I made it wrong, because even if I let the signal-handling thread write in the pipe that's in the select's read set, select() won't return. How could I let select() return?
I've read something about epoll(), that's the efficient to handle many simultaneous connections efficiently. Should i, and if how, use it? I can't figure it out only reading man epoll, and on my text book it's not even mentioned.
There are some good practices that I could use for handling system's failures? I almost check every system call's return value to, eventually, handle the error to free memory and other stuff like this, but my code keeps growing a lot, and almost for the same operations repeated many times. How could I write a cleanup function that could free memory before returning with abort()?
Anyway, thanks a lot in advice for your help, this platform is really amazing, and when I'll get more expert, I'll pay the community back giving my help!
(Sorry for my English, but it's not my mother language)

Why is this not a bug in qmail?

I was reading DJB's "Some thoughts on security after ten years of Qmail 1.0" and he listed this function for moving a file descriptor:
int fd_move(to,from)
int to;
int from;
{
if (to == from) return 0;
if (fd_copy(to,from) == -1) return -1;
close(from);
return 0;
}
It occurred to me that this code does not check the return value of close, so I read the man page for close(2), and it seems it can fail with EINTR, in which case the appropriate behavior would seem to be to call close again with the same argument.
Since this code was written by someone with far more experience than I in both C and UNIX, and additionally has stood unchanged in qmail for over a decade, I assume there must be some nuance that I'm missing that makes this code correct. Can anyone explain that nuance to me?
I've got two answers:
He was trying to make a point about factoring out common code and often such examples omit error checking for brevity and clarity.
close(2) may return EINTER, but does it in practice, and if so what would you reasonably do? Retry once? Retry until success? What if you get EIO? That could mean almost anything, so you really have no reasonable recourse except logging it and moving on. If you retry after an EIO, you might get EBADF, then what? Assume that the descriptor is closed and move on?
Every system call can return EINTR, escpecially one that blocks like read(2) waiting on a slow human. This is a more likely scenario and a good "get input from terminal" routine will indeed check for this. That also means that write(2) can fail, even when writing a log file. Do you try to log the error that the logger generated or should you just give up?
When a file descriptor is dup'd, as it is in the fd_copy or dup2 function, you will end up with more than one file descriptor referring to the same thing (i.e. the same struct file in the kernel). Closing one of them will simply decrement its reference count. No operation is performed on the underlying object unless it is the last close. As a result, conditions such as EINTR and EIO are not possible.
Another possibility is that his function is used only in an application (or a part of one) which has done something to ensure that the call will not be interrupted by a signal. If you aren't going to do anything important with signals, then you don't have to be responsive to them, and it might make sense to mask them all out, rather than wrap every single blocking system call in an EINTR retry. Except of course the ones that will kill you, so SIGKILL and frequently SIGPIPE if you handle it by quitting, along with SIGSEGV and similar fatal errors which will in any case never be delivered to a correct user-space app.
Anyway, if all he's talking about is security, then quite possibly he doesn't have to retry close. If close failed with EIO, then he would not be able to retry it, it would be a permanent failure. Therefore, it is not necessary for the correctness of his program that close succeeds. It may well be that it is not necessary for the correctness of his program that close be retried on EINTR, either.
Usually you want your program to make a best effort to succeed, and that means retrying on EINTR. But this is a separate concern from security. If your program is designed so that some function failing for any reason isn't a security flaw, then in particular the fact that it happens to have failed EINTR, rather than for a permanent reason, isn't a flaw. DJB has been known to be fairly opinionated, so I would not be at all surprised if he has proved some reason why he doesn't need to retry, and therefore doesn't bother, even if doing so would allow his program to succeed in flushing the handle in certain situations where maybe it currently fails (like being explicitly sent a harmless signal with kill by the user at a crucial moment).
Edit: it occurs to me that retrying on EINTR could potentially itself be a security flaw under certain conditions. It introduces a new behaviour to that section of code: it can loop indefinitely in response to a signal flood, where previously it would make one attempt to close and then return. I don't know for sure that this would cause qmail any problems (after all, close itself makes no guarantees how soon it will return). But if giving up after one attempt does make the code easier to analyse then it could plausibly be a smart move. Or not.
You might think that retrying prevents a DoS flaw, where a signal causes a spurious failure. But retrying allows another (more difficult) DoS flaw, where a signal flood causes an indefinite stall. In terms of binary "can this app be DoSed?", which is the kind of absolute security question DJB was interested in when he wrote qmail and djbdns, it makes no difference. If something can happen once, then normally that means it can happen many times.
Only broken unices ever return EINTR without you explicitly asking for it. The sane semantics for signal() enable restartable system calls ("BSD style"). When building a program on a system with the sysv semantics (interrupting signals), you should always replace calls to signal() with calls to bsd_signal(), which you can define yourself in terms of sigaction() if it doesn't exist.
It's further worth noting that no systems will return EINTR on signal receipt unless you have installed signal handlers. If the default action is left in place, or if the signal is set to no action, it's impossible for system calls to be interrupted.

detect program termination (C, Windows)

I have a program that has to perform certain tasks before it finishes. The problem is that sometimes the program crashes with an exception (like database cannot be reached, etc).
Now, is there any way to detect an abnormal termination and execute some code before it dies?
Thanks.
code is appreciated.
1. Win32
The Win32 API contains a way to do this via the SetUnhandledExceptionFilter function, as follows:
LONG myFunc(LPEXCEPTION_POINTERS p)
{
printf("Exception!!!\n");
return EXCEPTION_EXECUTE_HANDLER;
}
int main()
{
SetUnhandledExceptionFilter((LPTOP_LEVEL_EXCEPTION_FILTER)&myFunc);
// generate an exception !
int x = 0;
int y = 1/x;
return 0;
}
2. POSIX/Linux
I usually do this via the signal() function and then handle the SIGSEGV signal appropriately. You can also handle the SIGTERM signal and SIGINT, but not SIGKILL (by design). You can use strace() to get a backtrace to see what caused the signal.
There are sysinternals forum threads about protecting against end-process attempts by hooking NT Internals, but what you really want is either a watchdog or peer process (reasonable approach) or some method of intercepting catastrophic events (pretty dicey).
Edit: There are reasons why they make this difficult, but it's possible to intercept or block attempts to kill your process. I know you're just trying to clean up before exiting, but as soon as someone releases a process that can't be immediately killed, someone will ask for a method to kill it immediately, and so on. Anyhow, to go down this road, see above linked thread and search some keywords you find in there for more. hook OR filter NtTerminateProcess etc. We're talking about kernel code, device drivers, anti-virus, security, malware, rootkit stuff here. Some books to help in this area are Windows NT/2000 Native API, Undocumented Windows 2000 Secrets: A Programmer's Cookbook, Rootkits: Subverting the Windows Kernel, and, of course, Windows® Internals: Fifth Edition. This stuff is not too tough to code, but pretty touchy to get just right, and you may be introducing unexpected side-effects.
Perhaps Application Recovery and Restart Functions could be of use? Supported by Vista and Server 2008 and above.
ApplicationRecoveryCallback Callback Function Application-defined callback function used to save data and application state information in the event the application encounters an unhandled exception or becomes unresponsive.
On using SetUnhandledExceptionFilter, MSDN Social discussion advises that to make this work reliably, patching that method in-memory is the only way to be sure your filter gets called. Advises to instead wrap with __try/__except. Regardless, there is some sample code and discussion of filtering calls to SetUnhandledExceptionFilter in the article "SetUnhandledExceptionFilter" and VC8.
Also, see Windows SEH Revisited at The Awesome Factor for some sample code of AddVectoredExceptionHandler.
It depends what do you do with your "exceptions". If you handle them properly and exit from program, you can register you function to be called on exit, using atexit().
It won't work in case of real abnormal termination, like segfault.
Don't know about Windows, but on POSIX-compliant OS you can install signal handler that will catch different signals and do something about it. Of course you cannot catch SIGKILL and SIGSTOP.
Signal API is part of ANSI C since C89 so probably Windows supports it. See signal() syscall for details.
If it's Windows-only, then you can use SEH (SetUnhandledExceptionFilter), or VEH (AddVectoredExceptionHandler, but it's only for XP/2003 and up)
Sorry, not a windows programmer. But maybe
_onexit()
Registers a function to be called when program terminates.
http://msdn.microsoft.com/en-us/library/aa298513%28VS.60%29.aspx
First, though this is fairly obvious: You can never have a completely robust solution -- someone can always just hit the power cable to terminate your process. So you need a compromise, and you need to carefully lay out the details of that compromise.
One of the more robust solutions is putting the relevant code in a wrapper program. The wrapper program invokes your "real" program, waits for its process to terminate, and then -- unless your "real" program specifically signals that it has completed normally -- runs the cleanup code. This is fairly common for things like test harnesses, where the test program is likely to crash or abort or otherwise die in unexpected ways.
That still gives you the difficulty of what happens if someone does a TerminateProcess on your wrapper function, if that's something you need to worry about. If necessary, you could get around that by setting it up as a service in Windows and using the operating system's features to restart it if it dies. (This just changes things a little; someone could still just stop the service.) At this point, you probably are at a point where you need to signal successful completion by something persistent like creating a file.
I published an article at ddj.com about "post mortem debugging" some years ago.
It includes sources for windows and unix/linux to detect abnormal termination. By my experience though, a windows handler installed using SetUnhandledExceptionFilter is not always called. In many cases it is called, but I receive quite a few log files from customers that do not include a report from the installed handlers, where i.e. an ACCESS VIOLATION was the cause.
http://www.ddj.com/development-tools/185300443

Strategy flushing file outputs at termination

I have an application that monitors a high-speed communication link and writes logs to a file (via standard C file IO). The response time to messages that arrive on the link is important, so I knowingly don't fflush the file at each message, because this slows down my response time.
However, in some circumstances my application is terminated "violently" (e.g. by killing the process), and in these cases the last few log messages are not written (even if the communication link has been quiet for some time).
What techniques/strategies can I use to make sure most of my data is flushed, but without giving up speed of response?
Edit: The application runs on Windows
Using a thread is the standard solution to this. Have your data collection code write data to a thread-safe queue and use a semaphore to signal the writing thread.
However, before you go there, double-check your assertion that fflush() would be slow. Most operating systems have a file system cache. It makes writes very fast, as simple memory-to-memory block copy. The data gets written to disk lazily, your crash won't affect it.
If you are on Unix or Linux, your process would receive some termination signal which you can catch (except SIGKILL) and fflush() in your signal handler.
For signal catching see man sigaction.
EDIT: No idea about Windows.
I would suggest an asynchronous write-though. That way you don't need to wait for the write IOP to happen, nor will the OS will delay the IOP. See CreateFile() flags FILE_FLAG_WRITE_THROUGH | FILE_FLAG_OVERLAPPED.
You don't need FILE_FLAG_NO_BUFFERING. That's only to skip the OS cache. You would only need it if you are worried about the entire OS dying violently.
If your program terminates by calling exit() or returning from main(), the C standard guarantees that open streams are flushed and closed, so no special handling is needed. It sounds from your description like this is what is happening: if your program died due to a signal, you wouldn't see the flush.
I'm having trouble understanding what the problem is exactly.
If it's just that you're trying to find a happy medium between flushing often and the default fully buffered output, then maybe line buffering is what you want:
setvbuf(stream, 0, _IOLBF, 0);

Resources