I understand the working of write() call wherein it writes the data only to kernel's buffers which is later written to disk by kernel after sorting the data optimally.
Calling fsync() on file descriptor makes sure that data is written to disk as soon as it's posted in the kernel's buffer.
My question is, whether fsync() should be called before write() or after write() call. I've read couple of books on the topic, looked on the internal as well but couldn't find a satisfactory answer.
fsync should be called after the write system call. It flushes the file data buffers and the metadata to the physical device.
Alternatively, you can use the O_SYNC flag in the open system call for the file and get the same result for each subsequent write call.
your understanding of write is kinda wrong. after calling the write system call, it queues the data into kernel's buffer then it flushes out the data into the opened file descriptor whether it is an opened file or a socket.
Related
One thread writes to a file (or even delete it), another one calls sendfile() for that file simultaneously for overlapping positions.
What is the expected behaviour here?
Also, If my understanding is correct, sendfile call will just add a record to socket (file description, position, length) and return to caller. (no copy to socket's buffer yet?), so even sendfile() returns before modification to file, OS depends on file existence until file is sent completely.
So what is the result of any modification until file is sent?
The expected behavior is that the result is unpredictable. sendfile() is not atomic, it's effectively equivalent to writing your own loop that calls read() from the file descriptor and write() on the socket descriptor. If some other process writes to the file while this is going on, you'll get a mix of the old and new contents. Think of it mainly as a convenience function, although it also should be significantly more efficient since it doesn't require multiple system calls, and can copy directly between the file buffer and the socket buffer, rather than copying back and forth between application buffers. Because of this the window of vulnerability should be smaller than doing the read/write loop, but it's still there.
To avoid this problem, you should use file locking between the process(es) doing the writing and the one calling sendfile(). So the sequence should be:
lock the file
call sendfile()
unlock the file
and the writing process should do:
lock the file
write to the file
unlock the file
EDIT:
Actually, it looks like it isn't this simple, because sendfile() links the socket buffer to the file buffer cache, rather than copying it in the kernel. Since sendfile() doesn't wait for the data to be sent, modifying the file after it returns could still affect what gets sent. You need to check that the data has all been received with application-layer acknowledgements. See Minimizing copies when writing large data to a socket for an excerpt from an article that explains these details.
I looked on the Linux man pages for the answer but can't seem to find it. I know that read() is blocking but I'm still not sure about write().
Can anyone point me to any documentation for clarification?
Read POSIX on read() and
write(). See also functions such as open() and pipe().
It depends on the attributes of the file descriptor you're reading from or writing to (think O_NONBLOCK, for example), and on the underlying file type (disk file vs pipe vs FIFO vs socket vs character or block special), and so on.
Succinctly, both read() and write() can be blocking or non-blocking, depending on circumstances.
I have two (POSIX) threads that write to a log file like this:
pthread_mutex_lock(&log_mutex);
fprintf(LOG, "something...\n");
fsync(fileno(LOG));
pthread_mutex_unlock(&log_mutex);
The file is opened in main() with fopen() with mode "a". While the process is running I can't see anything appearing in the file with cat or tail although after the process is terminated and the file is fclose()-ed, the lines are all there.
What am I doing wrong?
I guess you need to call fflush() to flush the changed to the file system.
See also difference between fsync() and fflush().
Since you are using FILE handle in C, you need to first flush the data from C/C++ buffers to kernel buffers by calling fflush(). fsync is not really required unless you also want to make sure the data reaches the underlying physical storage especially for durability concerns
I'm reading some files from the procfs every few seconds, and displaying the information. Instead of opening and closing the files each time, I'm maintaining open file handles and closing them when I'm finished. The problem is that I'm consistently getting old data. The information gathered from the first read is returned in subsequent reads, and I've confirmed that the procfs files are indeed changing.
The only workaround I've found is to do an fflush() before a rewind() when reading the data. This works, but I don't understand why. I know that if I have two programs reading and writing to the same file, an fflush() would be necessary on the PRODUCER side to allow for these changes to be seen by the consumer. Here I'm doing a fflush() on the consumer side and it works. Doesn't the producer and consumer have different file handles, and so a fflush() in the consumer does not fflush() data written by the producer?
Any ideas why I'm getting stale data without fflush(), and up-to-date information using fflush()?
File streams are usually buffered meaning that they are copied to memory before reading to avoid locking them from other processes. You must ensure that your Stream is un-buffered to continually retrieve the information from the hard disk. To do that use setbuf (stream,NULL ); to ensure that your buffer is cleared.
You can read about setbuf here:
http://www.cplusplus.com/reference/clibrary/cstdio/setbuf/
The reason I assumed that your stream is buffered is because fflush(stream) clears a buffered stream.
You can read about that here:
http://www.cplusplus.com/reference/clibrary/cstdio/fflush/
I don't know the exact answer, but I would imagine that the reason for the behavior you observed is the CONSUMER-side cache. It reads the file in blocks, most likely larger than what you are processing at a time, so the "rest" of the buffer is being fed to you when you ask for "more". fflush() makes sure the cache is destroyed before giving you data
I've a new file, opened as read/write then 1 thread will receive from network and append binary data to that file, the other thread will read from the same file to process the binary data, but the read() always return 0, so I can't read the data, but if I using cat in command line to append data, then the program can read the data and process. I don't know why it can't notice the new data coming from network. I'm using open(), read(), and write() in this program.
Use a pipe instead of an HDD-file. Depending on your system (which you didnt tell us) there are only minor modifications to your code (which you didnt give us) to do that.
file operations are buffered. try flushing the stream?
Assuming that your read() and write() functions are the POSIX one, they share the file position, even if they are used in different threads. So your read after write was trying to read after the position at which write had written. Don't use file IO to communicate between threads. In most contexts, I'd not even use pipe or sockets for that (one context I'd use them is when the reading thread is using poll/select with other file descriptors) but simple shared memory and mutex.