What happens to a process if the filesystem is full? Does the kernel send us a signal to shutdown and if so what signal is it. Obviously, a program will probably crash if it writes to the file system but I'm curious as to how this occurs (in gory kernel/operating system detail).
What happens to a process if the filesystem fills up?
Operations that would require additional disk space on the full partition (like creating or appending to a file) fail with an errno of ENOSPC.
No signal is sent, as a full filesystem is not a critical condition which makes a signal necessary. It's a routine, easily handled error.
There is no reason a program should crash when the filesystem is full. Obviously file writes will fail, but a well-written program should be able to cope with that - in C, this would mean that fopen returns NULL or ferror returns a non-zero value, etc. I have encountered this many times, and some nasty things can happen such as overwriting a file with a blank version, but never a program crash. If it does happen, it is presumably because the author of the program tried to use a NULL file descriptor or some similar problem, in which case the program would receive a SIGSEGV as usual.
Related
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
I am writing a C program for an embedded Linux (debian-arm) device. In some cases, e.g. if a fatal error occurs on the system/program, I want the program to reboot the system by system("reboot");after logging the error(s) via syslog(). My program includes multithreads, UDP sockets, severalfwrite()/fopen(), malloc() calls, ..
I would like to ask a few question what (how) the program should perform processes just before rebooting the system apart from the syslog. I would appreciate to know how these things are done by the experienced programmers.
Is it necessary to close the open sockets (UDP) and threads just before rebooting? If it is the case, is there a function/system call that closes the all open sockets and threads? If the threads needs to be closed and there is no such global function/call to end them, how I suppose to execute pthread_exit(NULL); for each specific threads? Do I need go use something like goto to end the each threads?
How should the program closes files that fopen and fwrite uses? Is there a global call to close the files in use or do I need to find out the files in use manually then use fclose for the each file? I see see some examples on the forums fflush(), flush(), sync(),.. are used, which one(s) would you recommend to use? In a generic case, would it cause any problem if all of these functions are used (although these could be used unnecessary)?
It is not necessary to free the variables that malloc allocated space, is it?
Do you suggest any other tasks to be performed?
The system automatically issues SIGTERM signals to all processes as one of the steps in rebooting. As long as you correctly handle SIGTERM, you need not do anything special after invoking the reboot command. The normal idiom for "correctly handling SIGTERM" is:
Create a pipe to yourself.
The signal handler for SIGTERM writes one byte (any value will do) to that pipe.
Your main select loop includes the read end of that pipe in the set of file descriptors of interest. If that pipe ever becomes readable, it's time to exit.
Furthermore, when a process exits, the kernel automatically closes all its open file descriptors, terminates all of its threads, and deallocates all of its memory. And if you exit cleanly, i.e. by returning from main or calling exit, all stdio FILEs that are still open are automatically flushed and closed. Therefore, you probably don't have to do very much cleanup on the way out -- the most important thing is to make sure you finish generating any output files and remove any temporary files.
You may find the concept of crash-only software useful in figuring out what does and does not need cleaning up.
The only cleanup you need to do is anything your program needs to start up in a consistent state. For example, if you collect some data internally then write it to a file, you will need to ensure this is done before exiting. Other than that, you do not need to close sockets, close files, or free all memory. The operating system is designed to release these resources on process exit.
Firstly, I'm aware that opening a file with fopen() and not closing it is horribly irresponsible, and bad form. This is just sheer curiosity, so please humour me :)
I know that if a C program opens a bunch of files and never closes any of them, eventually fopen() will start failing. Are there any other side effects that could cause problems outside the code itself? For instance, if I have a program that opens one file, and then exits without closing it, could that cause a problem for the person running the program? Would such a program leak anything (memory, file handles)? Could there be problems accessing that file again once the program had finished? What would happen if the program was run many times in succession?
As long as your program is running, if you keep opening files without closing them, the most likely result is that you will run out of file descriptors/handles available for your process, and attempting to open more files will fail eventually. On Windows, this can also prevent other processes from opening or deleting the files you have open, since by default, files are opened in an exclusive sharing mode that prevents other processes from opening them.
Once your program exits, the operating system will clean up after you. It will close any files you left open when it terminates your process, and perform any other cleanup that is necessary (e.g. if a file was marked delete-on-close, it will delete the file then; note that that sort of thing is platform-specific).
However, another issue to be careful of is buffered data. Most file streams buffer data in memory before writing it out to disk. If you're using FILE* streams from the stdio library, then there are two possibilities:
Your program exited normally, either by calling the exit(3) function, or by returning from main (which implicitly calls exit(3)).
Your program exited abnormally; this can be via calling abort(3) or _Exit(3), dying from a signal/exception, etc.
If your program exited normally, the C runtime will take care of flushing any buffered streams that were open. So, if you had buffered data written to a FILE* that wasn't flushed, it will be flushed on normal exit.
Conversely, if your program exited abnormally, any buffered data will not be flushed. The OS just says "oh dear me, you left a file descriptor open, I better close that for you" when the process terminates; it has no idea there's some random data lying somewhere in memory that the program intended to write to disk but did not. So be careful about that.
The C standard says that calling exit (or, equivalently, returning from main) causes all open FILE objects to be closed as-if by fclose. So this is perfectly fine, except that you forfeit the opportunity to detect write errors.
EDIT: There is no such guarantee for abnormal termination (abort, a failed assert, receipt of a signal whose default behavior is to abnormally terminate the program -- note that there aren't necessarily any such signals -- and other implementation-defined means). As others have said, modern operating systems will clean up all externally visible resources, such as open OS-level file handles, regardless; however, FILEs are likely not to be flushed in that case.
There certainly have been OSes that did not clean up externally visible resources on abnormal termination; it tends to go along with not enforcing hard privilege boundaries between "kernel" and "user" code and/or between distinct user space "processes", simply because if you don't have those boundaries it may not be possible to do so safely in all cases. (Consider, for instance, what happens if you write garbage over the open-file table in MS-DOS, as you are perfectly able to do.)
Assuming you exit under control, using the exit() system call or returning from main(), then the open file streams are closed after flushing. The C Standard (and POSIX) mandate this.
If you exit out of control (core dump, SIGKILL) etc, or if you use _exit() or _Exit(), then the open file streams are not flushed (but the file descriptors end up closed, assuming a POSIX-like system with file descriptors - Standard C does not mandate file descriptors). Note that _Exit() is mandated by the C99 standard, but _exit() is mandated by POSIX (but they behave the same on POSIX systems). Note that file descriptors are separate from file streams. See the discussion of 'Consequences of Program Termination' on the POSIX page for _exit() to see what happens when a program terminates under Unix.
When the process dies, most modern operating systems (the kernel specifically) will free all of your handles and allocated memory.
What are the reasons that an exec (execl,execlp, etc.) can fail? If you make a call to exec and it returns, are there any best practices other than just panicking and calling exit?
The problem with handling exec failure is that usually exec is performed in a child process, and you want to do the error handling in the parent process. But you can't just exit(errno) because (1) you don't know if error codes fit in an exit code, and (2), you can't distinguish between failure to exec and failure exit codes from the new program you exec.
The best solution I know is using pipes to communicate the success or failure of exec:
Before forking, open a pipe in the parent process.
After forking, the parent closes the writing end of the pipe and reads from the reading end.
The child closes the reading end and sets the close-on-exec flag for the writing end.
The child calls exec.
If exec fails, the child writes the error code back to the parent using the pipe, then exits.
The parent reads eof (a zero-length read) if the child successfully performed exec, since close-on-exec made successful exec close the writing end of the pipe. Or, if exec failed, the parent reads the error code and can proceed accordingly. Either way, the parent blocks until the child calls exec.
The parent closes the reading end of the pipe.
From the exec(3) man page:
The execl(), execle(), execlp(), execvp(), and execvP() functions may fail and set errno for any of the errors specified for the library functions execve(2) and malloc(3).
The execv() function may fail and set errno for any of the errors specified for the library function execve(2).
And then from the execve(2) man page:
ERRORS
Execve() will fail and return to the calling process if:
[E2BIG] - The number of bytes in the new process's argument list is larger than the system-imposed limit. This limit is specified by the sysctl(3) MIB variable KERN_ARGMAX.
[EACCES] - Search permission is denied for a component of the path prefix.
[EACCES] - The new process file is not an ordinary file.
[EACCES] - The new process file mode denies execute permission.
[EACCES] - The new process file is on a filesystem mounted with execution disabled (MNT_NOEXEC in <sys/mount.h>).
[EFAULT] - The new process file is not as long as indicated by the size values in its header.
[EFAULT] - Path, argv, or envp point to an illegal address.
[EIO] - An I/O error occurred while reading from the file system.
[ELOOP] - Too many symbolic links were encountered in translating the pathname. This is taken to be indicative of a looping symbolic link.
[ENAMETOOLONG] - A component of a pathname exceeded {NAME_MAX} characters, or an entire path name exceeded {PATH_MAX} characters.
[ENOENT] - The new process file does not exist.
[ENOEXEC] - The new process file has the appropriate access permission, but has an unrecognized format (e.g., an invalid magic number in its header).
[ENOMEM] - The new process requires more virtual memory than is allowed by the imposed maximum (getrlimit(2)).
[ENOTDIR] - A component of the path prefix is not a directory.
[ETXTBSY] - The new process file is a pure procedure (shared text) file that is currently open for writing or reading by some process.
malloc() is a lot less complicated, and uses only ENOMEM. From the malloc(3) man page:
If successful, calloc(), malloc(), realloc(), reallocf(), and valloc() functions return a pointer to allocated memory. If there is an error, they return a NULL pointer and set errno to ENOMEM.
What you do after the exec() call returns depends on the context - what the program is supposed to do, what the error is, and what you might be able to do to work around the problem.
One source of trouble could be that you specified a simple program name instead of a pathname; maybe you could retry with execvp(), or convert the command into an invocation of sh -c 'what you originally specified'. Whether any of these is reasonable depends on the application. If there are major security issues involved, probably you don't try again.
If you specified a pathname and there is a problem with that (ENOTDIR, ENOENT, EPERM), then you may not have any sensible fallback, but you can report the error meaningfully.
In the old days (10+ years ago), some systems did not support the '#!' shebang notation, and if you were not sure whether you were executing an executable or a shell script, you tried it as an executable and then retried it as a shell script. That might or might not work if you were running a Perl script, but in those days, you wrote your Perl scripts to detect that they were being run by a shell and to re-exec themselves with Perl. Fortunately, those days are mostly over.
To the extent possible, it is important to ensure that the process reports the problem so that it can be traced - writing its message to a log file or just to stderr (or maybe even syslog()), so that those who have to work out what went wrong have more information to help them other than the hapless end user's report "I tried X and it didn't work". It is crucial that if nothing works, then the exit status is not 0 as that indicates success. Even that might be ignored - but you did what you could.
Other than just panicking, you could take a decision based on errno's value.
Exec should always succeed
(except for shells, e.g. if the user entered a bogus command).
If exec does fail, it indicates:
a "fault" with the program (missing or bad component, wrong pathname, bad memory, ...), or
a serious system error (out of memory, too many processes, disk fault, ...)
For any serious error, the normal approach is to write the error message on stderr, then exit with a failure code. Almost all of the standard tools do this. For exec:
execl("bork", "bork", NULL);
perror("failed: exec");
exit(127);
The shell does that, too (more or less).
Normally if a child process fails, the parent has failed too and should exit. It does not matter whether the child failed in exec, or while running the program. If exec failed, it does not matter why exec failed. If the child process failed for any reason, the calling process is in trouble and needs to stop.
Don't waste lots of time trying to anticipate all possible error conditions. Don't write code that tries to handle each error code in the best possible way. You'll just bloat the code, and introduce many new bugs. If your program is broken, or it's being abused, it should simply fail. If you force it to continue, worse trouble will come of that.
For example, if the system is out of memory and thrashing swap, we don't want to cycle over and over trying to run a process; it would just make the situation worse. If we get a filesystem error, we don't want to continue running on that filesystem; it might make the corruption worse. If the program was installed wrongly, or has a bug, or has memory corruption, we want to stop as soon as possible, before that broken program does some real damage (such as sending a corrupted report to a client, trashing a database, ...).
One possible alternative: a failing process might call for help, pause itself (SIGSTOP), then retry the operation if told to continue. This could help when the system is out of memory, or disks are full, or perhaps even if there is a fault in the program. Few operations are so expensive and important that this would be worthwhile.
If you're making an interactive GUI program, try to do it as a thin wrapper over reusable command-line tools (which exit if something goes wrong). Every function in your program should be accessible through the GUI, through the command-line, and as a function call. Write your functions. Write a few tools to make commmand-line and GUI wrappers for any function. Use sub-processes too.
If you are making a truly critical system, such as a controller for a nuclear power station, or a program to predict tsunamis, then what are you doing reading my dumb advice? Critical systems should not depend entirely on computers or software. There needs to be a 'manual override', with someone to drive it. Especially, do not attempt to build a critical system on MS Windows; that is like building sandcastles underwater.
I have an application that monitors a high-speed communication link and writes logs to a file (via standard C file IO). The response time to messages that arrive on the link is important, so I knowingly don't fflush the file at each message, because this slows down my response time.
However, in some circumstances my application is terminated "violently" (e.g. by killing the process), and in these cases the last few log messages are not written (even if the communication link has been quiet for some time).
What techniques/strategies can I use to make sure most of my data is flushed, but without giving up speed of response?
Edit: The application runs on Windows
Using a thread is the standard solution to this. Have your data collection code write data to a thread-safe queue and use a semaphore to signal the writing thread.
However, before you go there, double-check your assertion that fflush() would be slow. Most operating systems have a file system cache. It makes writes very fast, as simple memory-to-memory block copy. The data gets written to disk lazily, your crash won't affect it.
If you are on Unix or Linux, your process would receive some termination signal which you can catch (except SIGKILL) and fflush() in your signal handler.
For signal catching see man sigaction.
EDIT: No idea about Windows.
I would suggest an asynchronous write-though. That way you don't need to wait for the write IOP to happen, nor will the OS will delay the IOP. See CreateFile() flags FILE_FLAG_WRITE_THROUGH | FILE_FLAG_OVERLAPPED.
You don't need FILE_FLAG_NO_BUFFERING. That's only to skip the OS cache. You would only need it if you are worried about the entire OS dying violently.
If your program terminates by calling exit() or returning from main(), the C standard guarantees that open streams are flushed and closed, so no special handling is needed. It sounds from your description like this is what is happening: if your program died due to a signal, you wouldn't see the flush.
I'm having trouble understanding what the problem is exactly.
If it's just that you're trying to find a happy medium between flushing often and the default fully buffered output, then maybe line buffering is what you want:
setvbuf(stream, 0, _IOLBF, 0);