Is it ok to have two processes writing to same file multiple times in a certain span of time usign fopen(append mode), fprintf, fclose functionality.
they wait for data and repeat the open,write,close operation.
is there a chance of data not being written?
If your two processes are accessing the same file, you should use a Mutex to prevent both processes from writing to the file at the same time.
The link above is for Windows platforms. For other platforms, you'll need the platform's appropriate construct, such as a pthreads mutex created with PTHREAD_PROCESS_SHARED.
The fopen() function opens the file with no sharing protections (SH_DENYNO), at least on Win32/VC++, so without some kind of locking protocol you'll potentially corrupt the file.
If your app is Win32 you can use the CreateFile API to specify sharing modes. The _fsopen() function is somewhat more portable, but not as portable as fopen().
If you're not on Windows, the easiest way among cooperating processes is to use flock. (Use fdopen to get a FILE from a fd.)
Related
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
this is a design question more than a coding problem. I have a parent process that will fork many children. Each of the children is supposed to read and write on the same text file.
How can we achieve this safely?
My thoughts:
create the file pointer in the parent, then create a binary semaphore on it. And processes will compete on obtaining the file pointer and write on the file. In the read case i don't need a semaphore.
Please tell me if i got it wrong.
I am using C under linux.
Thank you.
POSIX systems have kernel level file locks using fcntl and/or flock. Their history is a bit complicated and their use and semantics not always obvious but they do work, especially in simple cases. For locking an entire file, flock is easier to use IMO. If you need to lock only parts of a file, fcntl provides that ability.
As an aside, file locking over NFS is not safe on all (most?) platforms.
man 2 flock
man 2 fcntl
http://en.wikipedia.org/wiki/File_locking#In_Unix-like_systems
Also, keep in mind that file locks are "advisory" only. They don't actually prevent you from writing/reading/etc to a file if you bypass acquiring the lock.
If writers are appending data to the file, your approach seems fine (at least up until the file becomes too large for the file system).
If writers are doing file replacement, then I would approach it something like this:
The reading API would check the time of last modification (with fstat()) against a cached value. If the time has changed, the file is re-opened, and the cached modification time updated, before the read is performed.
The writing API would acquire a lock, and write to a temporary file. Then, the actual data file is replaced by calling rename(), after which the lock is released.
If writers can write anywhere in the file, then you probably want are more structured file than just plain text, similar to a database. In such a case, some kind of reader-writer lock should be used to manage data consistency and data integrity.
I'm writing a server web.
Each connection is served by a separate thread, so I don't know in advance the number of threads.
There are also a group of text files (don't know the number, too), and each thread can read/write on each file.
A file can be written by just one thread a time, but different threads can write on different files at the same time.
If a file is read by one or more threads (reads can be concurrent), no thread can write on THAT file.
Now, I noticed this (Thread safe multi-file writing) solution, but I'd like also to use functions as fgets(), for example.
So, can I flock() a file, and then use a fgets() or another stdio read/write library function?
First of all, use fcntl, not flock. The latter is a non-standard, deprecated BSD function and does not work with NFS and possibly other filesystems. fcntl locking on the other hand is POSIX standard and is intended to work everywhere.
Now if you want to use file-level reader-writer locking mixed with stdio, it will work, but you have to take some care to ensure that buffering does not break your assumptions about locks. The method I'm about to explain is not the only one, but I believe it's the clearest/simplest:
When you want to operate on one of your files with stdio, obtaining the correct type of lock (read or write, aka shared of exclusive) should be the first thing you do after fopen. Use fileno to get the file descriptor number and apply the lock to it. After that, perform your entire read or write operation. Do not make any attempt to unlock the file; instead, call fclose to close the file and let it be implicitly unlocked when it's closed. Otherwise you may release the lock while unbuffered data is still unwritten, or later read data that was buffered before the lock was released, that's no longer valid after the lock is released.
Hello every one I want to ask a question about flockfile function I was reading the description and came to know that it is used in threads. but I am doing forking which means that there will be different process not threads can I use flockfile with different process does it make any difference?
Thanks
The flockfile function doesn't lock a file but the FILE data structure that a process uses to access a file. So this is about the representation in address space that a process has of the file, not necessarily about the file itself.
Even in a process if you have different FILEs open on the same file, you can write simultaneously to that file, even if you have locked each of the FILEs by means of flockfile.
For locking on the file itself have a look into flock and lockf but beware that the rules of their effects for access files through different threads of the same process are complicated.
These functions can only be used within one process.
From the POSIX docs:
In summary, threads sharing stdio streams with other threads can use flockfile() and funlockfile() to cause sequences of I/O performed by a single thread to be kept bundled.
All the rest of that page talks about mutual exclusion between threads. Different processes will have different input/output buffers for file streams, this locking wouldn't really make sense/be effective.
Is there a problem with using pread on the same file descriptor from 2 or more different threads at the same time?
pread itself is thread-safe, since it is not on the list of unsafe functions. So it is safe to call it.
The real question is: what happens if you read from the same file concurrently (not necessarily from two threads, but also from two processes).
Regarding this, the specification says:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Note that it doesn't mention ordinary files. This bit relates only to read anyway, because pread cannot be used on unseekable files.
I/O is intended to be atomic to ordinary files and pipes and FIFOs.
But this is from the non-normative section, so your OS might do it differently. E.g., if you read from two threads and there is a concurrent write, you might get different pieces of the write in your two read buffers. But this kind of problem is not specific to multithreading.
Also nice to know that in some cases
read() shall block the calling thread
Not the process, just the thread. And
A thread that has blocked shall not prevent any unblocked thread [...] from eventually making forward progress
As we are using same fd, we have to bind a lock otherwise there will be mix of data from the two pread on the file descriptor.
Hence yes there is a problem in doing this
http://linux.die.net/man/2/pread
I'm not 100% sure but I think that the file descriptor structure itself isn't thread safe, so two concurrent changes to it would corrupt it. You need some kind of locking.