Accessing a file by several processes - c

this is a design question more than a coding problem. I have a parent process that will fork many children. Each of the children is supposed to read and write on the same text file.
How can we achieve this safely?
My thoughts:
create the file pointer in the parent, then create a binary semaphore on it. And processes will compete on obtaining the file pointer and write on the file. In the read case i don't need a semaphore.
Please tell me if i got it wrong.
I am using C under linux.
Thank you.

POSIX systems have kernel level file locks using fcntl and/or flock. Their history is a bit complicated and their use and semantics not always obvious but they do work, especially in simple cases. For locking an entire file, flock is easier to use IMO. If you need to lock only parts of a file, fcntl provides that ability.
As an aside, file locking over NFS is not safe on all (most?) platforms.
man 2 flock
man 2 fcntl
http://en.wikipedia.org/wiki/File_locking#In_Unix-like_systems
Also, keep in mind that file locks are "advisory" only. They don't actually prevent you from writing/reading/etc to a file if you bypass acquiring the lock.

If writers are appending data to the file, your approach seems fine (at least up until the file becomes too large for the file system).
If writers are doing file replacement, then I would approach it something like this:
The reading API would check the time of last modification (with fstat()) against a cached value. If the time has changed, the file is re-opened, and the cached modification time updated, before the read is performed.
The writing API would acquire a lock, and write to a temporary file. Then, the actual data file is replaced by calling rename(), after which the lock is released.
If writers can write anywhere in the file, then you probably want are more structured file than just plain text, similar to a database. In such a case, some kind of reader-writer lock should be used to manage data consistency and data integrity.

Related

Linux: Reading file while other program might modify it

A program Foo periodically updates a file and calls my C program Bar to process the file.
The issue is that the Foo might update the file, call Bar to process it, and while Bar reads the file, Foo might update the file again.
Is it possible for Bar to read the file in inconsistent state, e.g. read first half of the file as written by first Foo and the other half as written by the second Foo? If so, how would I prevent that, assuming I can modify only Bar's code?
Typically, Foo should not simply rewrite the contents of the file again and again, but create a new temporary file, and replace the old file with the temporary file when it is done (using link()). In this case, simply opening the file (at any point in time) will give the reader a consistent snapshot of the contents, because of how typical POSIX filesystems work. (After opening the file, the file descriptor will refer to the same inode/contents, even if the file gets deleted or replaced; the disk space will be released only after the last open file descriptor of a deleted/replaced file is closed.)
If Foo does rewrite the same file (without a temporary file) over and over, the recommended solution would be for both Foo and Bar to use fcntl()-based advisory locking. (However, using a temporary file and renaming/linking it over the actual file when complete, would be even better.)
(While flock()-based locking might seem easier, it is actually a bit of a guessing game whether it works on NFS mounts or not. fcntl() works, unless the NFS server is configured not to support locking. Which is a bit of an issue on some commercial web hosts, actually.)
If you cannot modify the behaviour of Foo, and it does not use advisory locking, there are still some options in Linux.
If Foo closes the file -- i.e., Bar is the only one to open the file --, then taking an exclusive file lease (using fcntl(descriptor, F_SETLEASE, F_WRLCK) is a workable solution. You can only get an exclusive file lease if descriptor is the only open descriptor on the file, and the owner user of the file is the same as the process UID (or the process has the CAP_LEASE capability). If any other process tries to open or truncate the file, the lease owner gets signaled (SIGIO by default), and has up to /proc/sys/fs/lease-break-time seconds to downgrade or release the lease. The opener is blocked for the duration, which allows Bar to either cancel the processing, or copy the file for later processing.
The other option for Bar is rather violent. It can monitor the file say once per second, and when the file is old enough -- say, a few seconds --, pause Foo by sending it a SIGSTOP signal, checking /proc/FOOPID/stat until it gets stopped, and rechecking the file statistics to verify it's still old, until making a temporary copy of it (either in memory, or on disk) for processing. After the file is read/copied, Bar can let Foo continue by sending it a SIGCONT signal.
Some filesystems may support file snapshots, but in my opinion, one of the above are much saner than relying on nonstandard filesystem support to function correctly. If Foo cannot be modified to co-operate, it is time to refactor it out of the picture. You do not want to be a hostage for a black box out of your control, so the sooner you replace it with something more user/administrator-friendly, the better you'll be in the long term.
This is difficult to do robustly without Foo's cooperation.
Unixes have two main kinds of file locking:
range locking with fcntl(2)
always-whole-file locking with flock(2)
Ideally, you use either of these in cooperative mode (advisory locking), where all participants attempt to acquire the lock and only one will get it at a time.
Without the other program's cooperation, your only recourse, as far as I know is mandatory locking, which you can have with fcntl if you allow it on the filesystem, but the manpage mentions that the Linux implementation is unreliable.
In all UN*X systems, what is warranted to happen atomically is the write(2) or read(2) system calls. The kernel even locks the file inode in memory, so while you are read(2)ing or write(2)ing it, it would not change.
For more spatial atomicity, you have to lock the whole file. You can use the file locking tools available to lock different regions of a file. Some are advisory (you can force an skip over them) and others are mandatory (you are blocked until the other side unblocks the file region)
See fcntl(2) and the options F_GETLK, F_SETLK and F_SETLKW to get lock info, set lock for reading or writing, respectively.

C read and thread safety (linux)

What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.

flock(), then fgets(): low-level locks, then stdio read/write library functions. Is it possible?

I'm writing a server web.
Each connection is served by a separate thread, so I don't know in advance the number of threads.
There are also a group of text files (don't know the number, too), and each thread can read/write on each file.
A file can be written by just one thread a time, but different threads can write on different files at the same time.
If a file is read by one or more threads (reads can be concurrent), no thread can write on THAT file.
Now, I noticed this (Thread safe multi-file writing) solution, but I'd like also to use functions as fgets(), for example.
So, can I flock() a file, and then use a fgets() or another stdio read/write library function?
First of all, use fcntl, not flock. The latter is a non-standard, deprecated BSD function and does not work with NFS and possibly other filesystems. fcntl locking on the other hand is POSIX standard and is intended to work everywhere.
Now if you want to use file-level reader-writer locking mixed with stdio, it will work, but you have to take some care to ensure that buffering does not break your assumptions about locks. The method I'm about to explain is not the only one, but I believe it's the clearest/simplest:
When you want to operate on one of your files with stdio, obtaining the correct type of lock (read or write, aka shared of exclusive) should be the first thing you do after fopen. Use fileno to get the file descriptor number and apply the lock to it. After that, perform your entire read or write operation. Do not make any attempt to unlock the file; instead, call fclose to close the file and let it be implicitly unlocked when it's closed. Otherwise you may release the lock while unbuffered data is still unwritten, or later read data that was buffered before the lock was released, that's no longer valid after the lock is released.

Deleting a possibly locked file in c

I am using fcntl locks in C on linux and have a dilemma of trying to delete a file that may possibly be locked from other processes that also check for the fcntl locking mechanism. What would be the preferred way of handling this file which must be deleted, (Should I simply delete the file without regard of other processes that may have reader locks or is there a better way)?
Any help would be much appreciated.
On UNIX systems, it is possible to unlink a file while it is still open; doing so decrements the reference count on the file, but the actual file and its inode remains around until the reference count goes to zero.
As others have noted, you are free to delete the file even while you hold the lock.
Now, a cautionary note: you didn't mention why processes are locking this file, but you should be aware that if you are using that file for interprocess synchronization, deleting it is a good way to introduce subtle race conditions into your system, basically because there's no way to atomically create AND lock the file in a single operation.
For example, process AA might create the file, with the intention of locking it immediately to do whatever updates it needs to do. However, there's nothing to prevent process BB from grabbing the lock on the file first, then deleting the file, leaving process AA with a handle to the now deleted file. Process AA will still be able to lock and update that file, but those updates will effectively be "lost" because the file's already been deleted.
Moreover, locks on UNIX system are advisory by default, not mandatory, so that locking a file does not prevent it from being open or unlinked, just from being locked again.

fopen two processes

Is it ok to have two processes writing to same file multiple times in a certain span of time usign fopen(append mode), fprintf, fclose functionality.
they wait for data and repeat the open,write,close operation.
is there a chance of data not being written?
If your two processes are accessing the same file, you should use a Mutex to prevent both processes from writing to the file at the same time.
The link above is for Windows platforms. For other platforms, you'll need the platform's appropriate construct, such as a pthreads mutex created with PTHREAD_PROCESS_SHARED.
The fopen() function opens the file with no sharing protections (SH_DENYNO), at least on Win32/VC++, so without some kind of locking protocol you'll potentially corrupt the file.
If your app is Win32 you can use the CreateFile API to specify sharing modes. The _fsopen() function is somewhat more portable, but not as portable as fopen().
If you're not on Windows, the easiest way among cooperating processes is to use flock. (Use fdopen to get a FILE from a fd.)

Resources