Handling C Read Only File Close Errors - c

I'm doing some basic file reading using open, read, and close (Files are opened with access mode O_RDONLY).
When it comes time to close the file, I can't think of a good way to handle a possible file close error to make sure that the file is closed properly.
Any suggestions?

From my experience close will succeed even when it fails. There are several reasons for this.
I suspect that one of the big reasons close started to fail on some operating systems was AFS. AFS was a distributed file system from the '80s with interesting semantics - all your writes were done to a local cache and your data was written to the server when you close the file. AFS also was cryptographically authenticated with tokens that expired after a time. So you could end up in an interesting situation where all the writes you did to a file were done while your tokens were valid, but close which actually talked to the file server could be done with expired tokens meaning that all the data you wrote to the local cache was lost. This is why close needed to communicate to the user that something went wrong. Most file editors handle this correctly (emacs refuses to mark the buffer as not dirty, for example), but I've rarely seen other applications that can handle this.
That being said, close can't really fail anyway. close is implicit during exit, exec (witch close-on-exec file descriptors) and crashes that dump core. Those are situations where you can't fail. You can't have exit or a crash fail just because closing the file descriptor failed. What would you do when exit fails? Crash? What if crashing fails? Where do we run after that? Also, since almost no one checks for errors in close if failing was common you'd end up with file descriptor leaks and information leaks (what if we fail to close some file descriptor before spawning an unprivileged process?). All this is too dangerous for an operating system to allow so all operating systems I've looked at (*BSD, Linux, Solaris) close the file descriptor even if the underlying filesystem close operation has failed.
In practice this means that you just call close and ignore any errors it returns. If you have a graceful way of handling it failing like editors do you can send a message to the user and let the user resolve the problem and reopen the file, write down the data and try to close again. Don't do anything automatically in a loop or otherwise. Errors in close are beyond your control in an application.

I guess the best thing to do is to retry a couple of times, with some short delay between attempts, then log the problem and move on.
The manual page for close() mentions EINTR as being a possible error, which is why re-trying can help.
If your program is about to exit anyway, I wouldn't worry too much about this type of error checking, since any resources you've allocated are going to be de-allocated by the operating system anyway (on most/typical desktop/server platforms, that is).

Related

Can you open a directory without blocking on I/O?

I'm working on a Linux/C application with strict timing requirements. I want to open a directory for reading without blocking on I/O (i.e. succeed only if the information is immediately available in cache). If this request would block on I/O I would like to know so that I can abort and ignore this directory for now. I know that open() has a non-blocking option O_NONBLOCK. However, it has this caveat:
Note that this flag has no effect for regular files and
block devices; that is, I/O operations will (briefly)
block when device activity is required, regardless of
whether O_NONBLOCK is set.
I assume that a directory entry is treated like a regular file. I don't know of a good way to prove/disprove this. Is there a way to open a directory without any I/O blocking?
You could try using COPROC command in linux to run a process in background. Maybe it could work for you.

What is the most suitable IPC between programs in C

I've been building 2 programs in C. The first program is a service that automatically shutdowns the pc if the time is past 22:00 with a countdown of 1 minutes. And at the same time it opens the second program and ask for username and password via ShellExecution in winapi. If the login is successful in the second program then the shutdown is totaly canceled. Then when the second program is closed or terminated then the shutdown process repeats all over again if the time is still past 22:00. My problem is how should I allow or make this kind of communication to happen between the two program. I'm thinking of using shared memory or pipe. I'm not sure what kind of ipc is suitable for this kind of situation. Note that if the second program is already open and logged in the shutdown process would not occure. And it should have no problem even if multiple instances of the second program is open and logged in.
I don't know if you need another process. I'd be tempted to do the login/password with another thread. That gives you shared memory with little effort.
Not being a crack Windows programmer, I don't know if you can do the ShellExecution in another thread, or if protections will get in your way.
If it didn't work and you had to do another process, I'd try a UNIX style pipe but those don't exist on Windoes, you get to a named pipes. Pipe are usually better than shared memory for security and complexity reasons. A trick is to make sure that only the correct program is on the other end of the pipe. That may or may not matter depending on how clever you expect an attacker to be.

What happens to open files which are not properly closed?

What happens if I do not close a file after writing to it?
Let us assume we got an too many open files error and due to that the program crashes.
Does the OS handle that for me? And if this damages the not-closed files, how do I notice that they are damaged?
From exit()
_exit() does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish.
Each return hides a system call to exit, so any unclosed descriptor is closed by the OS.
Generally speaking, if you write to a file, then your application crashes, the operating system will flush the buffers to the disk and clean up for you. The same will occur if your program exits without explicitly closing the files. This does not damage files.
The bad situation is when you write to a file and someone pulls the plug out on the computer.

Does umount asynchronously release the underlying device?

I have some code that umounts a file system on a device and then immediately removes the device from device-mapper using the DM_DEV_REMOVE ioctl command.
Sometimes, as part of a stress test, I run this code in a tight loop of:
create the device
mount the file system on the device
unmount the file system
remove the device
Often, when running this test over thousands of iterations, I will eventually get the errno EBUSY when trying to remove the device. The umount is always successful.
I have tried searching on this issue, but mostly what I find is people having issues with getting EBUSY when umounting, which is not the problem I am having.
The closest thing to being helpful that I could find is that in the man page for dmsetup it talks about using the --retry option as a workaround for udev rules opening up devices when you are trying to remove them. Unfortunately for me though, I have been able to confirm that udev does not have my device open when I am trying to remove it.
I have used the DM_DEV_STATUS command to check the open_count for my device, and what I see is that the open_count is always 1 before the umount and when my test succeeds it was 0 after the umount and when it fails it was 1 after the umount.
Now, what I am trying to find out to root-cause my issue is, "Could my resource busy failure be caused by umount asynchronously releasing my device, thus creating a race condition?". I know that umount is supposed to be synchronous when it comes to the actual unmounting, but I couldn't find any documentation for whether releasing/closing the underlying device could occur asynchronously or not.
And, if it isn't umount holding a open handle to my device, are there any other likely candidates?
My test is running on a 3.10 kernel.
Historically, system calls blocked the process involved until all the task is done (being write(2) to a block device the first major exception for obvious reasons) The reason was that you need one process to do the job and the syscall involved process was there for that reason (and you could charge the cpu processing to that user's account)
Nowadays, there are plenty of kernel threads involved in solving non-process related issues, and the umount(2) syscall can be one of the syscalls demanding some background (I think it isn't as umount(2) is not frequently issued to justify a change in the code)
But linux is not a unix descendant, so umount(2) could be implemented this way. I don't believe that, anyway.
umount(2) syscall normally succeeds, except when inodes on the filesystem are in use. That's not the case. But the kernel can be involved in some heavy duty process that makes it to alloc some kernel memory (not swappable) and fail in the request. This can lead to the error (note that this is only a guess, I have not checked this in the code, you had better to look at the umount(2) syscall implementation) you get anyway.
There's another issue, that could block your umount process (or fail) in case you have touched someway the filesystem. There's some references dependency code that makes filesystems capable of resist power failures in a consistent status (in linux, this is calles ordered data, in BSD systems it is called software updates, that makes erased files to not be freed immediately after unlink(2). This could block umount(2) (or make it fail) if some data has to be updated on the filesystem, previous to make the actual umount(2) call. But again, this should not be your case, as you say, you don't modify the mounted filesystem.

Delayed Write errors

For the past few months, we've been losing data to a Delayed Write errors. I've experienced the error with both custom code and shrink-wrap applications. For example, the error message below came from Visual Studio 2008 on building a solution
Windows - Delayed Write Failed : Windows was unable
to save all the data for the file
\Vital\Source\Other\OCHSHP\Done07\LHFTInstaller\Release\LHFAI.CAB. The
data has been lost. This error may be caused by a failure of your
computer hardware or network connection. Please try to save this file
elsewhere.
When it occurs in Adobe, Visual Studio, or Word, for example, no harm is done. The major problem is when it occurs to our custom applications (straight C apps that writes data in dBase files to a network share.)
From the program's perspective, the write succeeds. It deletes the source data, and goes on to the next record. A few minutes later, Windows pops up an error message saying that a delayed write occurred and the data was lost.
My question is, what can we do to help our networking/server teams isolate and correct the problem (read, convince them the problem is real. Simply telling them many, many times hasn't convinced them as of yet) and do you have any suggestions of how we can write to avoid the data loss?
Writes on Windows, like any modern operating system, are not actually sent to the disk until the OS gets around to it. This is a big performance win, but the problem (as you have found) is that you cannot detect errors at the time of the write.
Every operating system that does asynchronous writes also provides mechanisms for forcing data to disk. On Windows, the FlushFileBuffers or _commit function will do the trick. (One is for HANDLEs, the other for file descriptors.)
Note that you must check the return value of every disk write, and the return value of these synchronizing functions, in order to be certain the data made it to disk. Also note that these functions block and wait for the data to reach disk -- even if you are writing to a network server -- so they can be slow. Do not call them until you really need to push the data to stable storage.
For more, see fsync() Across Platforms.
You have a corrupted file system or a hard disk that is failing. The networking/server team should scan the disk to fix the former and detect the latter. Also check the error log to see if it tells you anything. If the error log indicates that failure to write to the hardware then you need to replace the disk.

Resources