Suppose I write a block to a file descriptor without doing fsync and then read the same block from the same descriptor some time later. Is it guaranteed that I will receive the same information?
The program is single-threaded and no other process will access the file at any time.
Yes, it is guaranteed by the operating system.
Even if the modifications have not made it to disk yet, the OS uses its buffer cache to reflect file modifications and guarantees atomicity level for reads and writes, to ALL processes. So not only your process, but any other process, would be able to see the changes.
As to fsync(), it only instructs the operating system to do its best to flush the contents to disk. See also fdatasync().
Also, I suggest you use two file descriptors: one for reading, another for writing.
fsync() synchronizes cache and disk. Since the data is already in the cache, it will be read from there instead of from disk.
When you write to a file descriptor, the data is stored in ram caches and buffers before being sent to disk. So as long as you don't close the descriptor, you can access the data you just wrote. If you close the descriptor, the file contents must be put to disk either by flushing it yourself or waiting for the OS to do it for efficiency, BUT if you want to be assured to access the just written data on disk after opening a new FD, you MUST flush to disk with fsync().
Related
What is the difference between fsync and syncfs ?
int syncfs(int fd);
int fsync(int fd);
The manpage for fync says the following :
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages
for) the file referred to by the file descriptor fd to the disk device (or other permanent stor‐
age device) so that all changed information can be retrieved even after the system crashed or
was rebooted. This includes writing through or flushing a disk cache if present. The call
blocks until the device reports that the transfer has completed. It also flushes metadata
information associated with the file (see stat(2)).
The manpage for syncfs says the following:
sync() causes all buffered modifications to file metadata and data to be written to the underly‐
ing filesystems.
syncfs() is like sync(), but synchronizes just the filesystem containing file referred to by the
open file descriptor fd.
For me both seem equal. They are sychronizing the file referred by the filedescriptor and the associated metadata.
First, fsync() (and sync()) are POSIX-standard functions while syncfs() is Linux-only.
So availability is one big difference.
From the POSIX standard for fsync():
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes. The nature of
the transfer is implementation-defined. The fsync() function shall
not return until the system has completed that action or until an
error is detected.
Note that it's just a request.
From the POSIX standard for sync():
The sync() function shall cause all information in memory that
updates file systems to be scheduled for writing out to all file
systems.
The writing, although scheduled, is not necessarily complete upon
return from sync().
Again, that's not something guaranteed to happen.
The Linux man page for syncfs() (and sync()) states
sync() causes all pending modifications to filesystem metadata and
cached file data to be written to the underlying filesystems.
syncfs() is like sync(), but synchronizes just the filesystem
containing file referred to by the open file descriptor fd.
Note that when the function returns is unspecified.
The Linux man page for fsync() states:
fsync() transfers ("flushes") all modified in-core data of (i.e.,
modified buffer cache pages for) the file referred to by the file
descriptor fd to the disk device (or other permanent storage device)
so that all changed information can be retrieved even if the system
crashes or is rebooted. This includes writing through or flushing a
disk cache if present. The call blocks until the device reports that
the transfer has completed.
As well as flushing the file data, fsync() also flushes the metadata
information associated with the file (see inode(7)).
Calling fsync() does not necessarily ensure that the entry in the
directory containing the file has also reached disk. For that an
explicit fsync() on a file descriptor for the directory is also
needed.
Note that the guarantees Linux provides for fsync() are much stronger than those provided for sync() or syncfs(), and by POSIX for both fsync() and sync().
In summary:
POSIX fsync(): "please write data for this file to disk"
POSIX sync(): "write all data to disk when you get around to it"
Linux sync(): "write all data to disk (when you get around to it?)"
Linux syncfs(): "write all data for the filesystem associated with this file to disk (when you get around to it?)"
Linux fsync(): "write all data and metadata for this file to disk, and don't return until you do"
Note that the Linux man page doesn't specify when sync() and syncfs() return.
I think the current answer is not complete. For Linux:
According to the standard specification (e.g., POSIX.1-2001), sync()
schedules the writes, but may return before the actual writing is
done. However Linux waits for I/O completions, and thus sync() or
syncfs() provide the same guarantees as fsync called on every file
in the system or filesystem respectively.
and
Before version 1.3.20 Linux did not wait for I/O to complete before
returning.
This is mentioned on the sync(2) page in the "notes" and "bugs" sections.
When I issue write(), my data goes to some kernel space buffers. The actual commit to physical layer ("phy-commit") is (likely) deferred, until.. (exactly until what events?)
When I issue a close() for a file descriptor, then
If [...], the resources associated with the open file description are freed
Does it mean releasing (freeing) those kernel buffers which contained my data? What will happen to my precious data, contained in those buffers? Will be lost?
How to prevent that loss?
Via fsync()? It requests an explicite phy-commit. I suppose that either immediately (synchronous call) or deferred only "for short time" and queued to precede subsequent operations, at least destructive ones.
But I do not quite want immediate or urgent phy-commit. Only (retaining my data and) not to forget doing phy-commit later in some time.
From man fclose:
The fclose() function [...] closes the underlying file descriptor.
...
fclose() flushes only the user-space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too, for example, with sync(2) or fsync(2).
It may suggest that fsync does not have to precede close (or fclose, which contains a close), but can (even have to) come after it. So the close() cannot be very destructive...
Does it mean releasing (freeing) those kernel buffers which contained my data? What will happen to my precious data, contained in those buffers? Will be lost?
No. The kernel buffers will not freed before it writes the data to the underlying file. So, there won't be any data loss (unless something goes really wrong - such as power outage to the system).
Whether the that data will be immediately written to the physical file is another question. It may dependent on the filesystem (which may be buffering) and/or any hardware caching as well.
As far as your user program is concerned, a successful close() call can be considered as successful write to the file.
It may suggest that fsync does not have to precede close (or fclose, which contains a close), but can (even have to) come after it. So the close() cannot be very destructive...
After a call to close(), the state of the file descriptor is left unspecified by POSIX (regardless of whether close() succeeded). So, you are not allowed to use fsync(fd); after the calling close().
See: POSIX/UNIX: How to reliably close a file descriptor.
And no, it doesn't suggest close() can be destructive. It suggests that the C library may be doing its own buffering in the user and suggests to use fsync() to flush it to the kernel (and now, we are in the same position as said before).
I keep on reading that fread() and fwrite() are buffered library calls. In case of fwrite(), I understood that once we write to the file, it won't be written to the hard disk, it will fill the internal buffer and once the buffer is full, it will call write() system call to write the data actually to the file.
But I am not able to understand how this buffering works in case of fread(). Does buffered in case of fread() mean, once we call fread(), it will read more data than we originally asked and that extra data will be stored in buffer (so that when 2nd fread() occurs, it can directly give it from buffer instead of going to hard disk)?
And I have following queries also.
If fread() works as I mention above, then will first fread() call read the data that is equal to the size of the internal buffer? If that is the case, if my fread() call ask for more bytes than internal buffer size, what will happen?
If fread() works as I mention above, that means at least one read() system call to kernel will happen for sure in case of fread(). But in case of fwrite(), if we only call fwrite() once during the program execution, we can't say for sure that write() system call be called. Is my understanding correct?
Will the internal buffer be maintained by OS?
Does fclose() flush the internal buffer?
There is buffering or caching at many different levels in a modern system. This might be typical:
C standard library
OS kernel
disk controller (esp. if using hardware RAID)
disk drive
When you use fread(), it may request 8 KB or so if you asked for less. This will be stored in user-space so there is no system call and context switch on the next sequential read.
The kernel may read ahead also; there are library functions to give it hints on how to do this for your particular application. The OS cache could be gigabytes in size since it uses main memory.
The disk controller may read ahead too, and could have a cache size up to hundreds of megabytes on smallish systems. It can't do as much in terms of read-ahead, because it doesn't know where the next logical block is for the current file (indeed it doesn't even know what file it is reading).
Finally, the disk drive itself has a cache, perhaps 16 MB or so. Like the controller, it doesn't know what file it is reading. For many years one disk block was 512 bytes, but it got a little larger (a few KB) recently with multi-terabyte disks.
When you call fclose(), it will probably deallocate the user-space buffer, but not the others.
Your understanding is correct. And any buffered fwrite data will be flushed when the FILE* is closed. The buffered I/O is mostly transparent for I/O on regular files.
But for terminals and other character devices you may care. Another instance where buffered I/O may be an issue is if you read from the file that one process is writing to from another process -- a common example is if a program writes text to a log file during operation, and the user runs a command like tail -f program.log to watch the content of the log file live. If the writing process has buffering enabled and it doesn't explicitly flush the log file, it will make it difficult to monitor the log file.
I checked man 2 sync
It shows sync and syncfs
void sync(void);
void syncfs(int fd);
syncfs is easy to understand. An fd is given and the data of that fd is written completely to underlying file systems.
What is it with sync?
sync() causes all buffered modifications to file metadata and data to be written to the underlying file systems.
Is it that all the buffers in the system are written to fs? or is it that all the files that are opened by this process are written to fs? I didnot quite understand "buffered modifications to file metadata"
Whenever you issue a write, send, write to file-backed mappings or similar things the kernel is not forced to flush that data straight to persistent storage, the underlying network stack, etc... This buffering is done for performance reasons.
sync instructs the kernel to do exactly this. Empty all buffers.
I know that when you call fwrite or fprintf or rather any other function that writes to a file, the contents aren't immediately flushed to the disk, but buffered in the memory.
Firstly, where do the OS manage these buffers and how. Secondly, if you do the write to a file and later read in the content you wrote and assuming that the OS didn't flushed the contents between the time you wrote and read, how it knows that it has to return the read from the buffer? How does it handle this situation.
The reason I want to know this is that I'm interested in implementing my own buffering scheme in user-space, rather than kernel space as done by OS. That is, write to a file would be buffered in user-space and the actual write will only occur at a certain point. Consquently I also need to handle situations where read is called for the content that is still in the buffer. Is it possible to do all this in user-space.
Firstly, where do the OS manage these buffers and how
The functions fwrite and fprintf use stdio buffers which already are completely in userspace. The buffers are (likely) static arrays or perhaps malloced memory.
how it knows that it has to return the read from the buffer
It doesn't, so the changes aren't seen. Nothing actually happens to a file until the underlying system call (write) is called (and even then - read on).
Is it possible to do all this in user-space
No, it's not possible. The good news is that the kernel already has buffers so every write you do isn't atually translated into an actual write to the file. It is postponed and executed later. If in the meantime somebody tries to read from the file, the kernel is smart enough to serve him from the buffer.
Bits from TLPI:
When working with disk files, the read() and write() system calls
don’t directly ini- tiate disk access. Instead, they simply copy data
between a user-space buffer and a buffer in the kernel buffer cache.
When performing I/O on a disk file, a successful return from write()
doesn’t guarantee that the data has been transferred to disk, because
the kernel performs buffering of disk I/O in order to reduce disk
activity and expedite write() calls.
At some later point, the kernel writes (flushes) its buffer to the
disk.
If, in the interim, another process attempts to read these bytes of
the file, then the kernel automatically supplies the data from the
buffer cache, rather than from (the outdated contents of) the file.
So you may want to find out about sync and fsync.
Multiple levels of buffering are generally bad. The reason stdio buffers are useful is that they minimize the number of system calls performed. If a system call would be cheaper nobody would use stdio buffers anymore.